DOCUHEIT HESDHE 



127 330 

ADTHOR 
TITLB 

INSTITDTION 
SPONS AGBNCY 

PDB DATE 
CONTRACT 
NOTE 



ED3S PRICE 
DESCRIPlORS 



IDENTIFIERS 



TH 005 415 

/ 

Rente r R* Robert; Bashaw/ !• 

Equating Reading Tests With the Rasch Model* Volume 
1, Pinal Report. 

Georgia Dniv., Athens* Educational Research lab* 
National Center for Education Sta,tistics (DHEW) , , 
Hashingtonr D*C* 

Sep 75 ' ' o ^ 

OEC-0-72-5237^ 

274p*; For related documentSr see TH 005 ^^^ and 416; 
tables may reproduce poorly 

MF-$0*83 HC-$m*05 Plus Postage* 

Comparative Analysis; Elementary Education; *Equated 

Scores; Goodness of Fit; *Item Analysis; 

*Ha thematical Models ; *Prabability ; Haw Scores ; 

Reading Comprehension; ^Reading Tests; Standard Error 

of Measurement ; Standardized Tests; Statistical 

Analysis; Test Reliability; Vocabulary. 

*Anchor Test Study; *Pasch Hodel' 



In order to determine i 
any utility for equating pre-existing t 
the data from the equating phase of the 
a variety of eguipercentile and linear 
involved included seven reading test ba 
to three levels and two forms/ and each 
com.prehension subtest. There were 28* fo 
possible* Therefore^ cf concern was the 
tests for each of vocabulary, comprehen 
objectives Characterized the study alid 
sections of this report. The objectives 
aethodology for test equating using the 
basic item analysis data for each test 
base, (3) evaluate the fit of the Rasch 
tests that were part of the data base, 
of Rasch Model parameter estimates unde 
size and sample composition, (5) provid 
Rasch Model methods, (6) estimate the e 
the use of these equating methods, and 
equating with those obtained in the Anc 



f Basch Model procedures have 
ests, this study reanalyzed 

Anchor Test Study which used 
model methods* The tests 
tteries, each having from one 

having a vocabulary and 
rm- level combinations 

simultaneous equating of 28 
sion, and total scores. Seven 
are elaborated on in separate 
were to: (1) describe a 
Rasch Model, (2) describe 
in the Anchor Test Stu^i^^data 
Model with respect to ttose 
(4) investigate the stability 
r conditions of varying sample 
e tables of equated based on 
quating error associated with 
(7) coiupare the results of 
hor Test Study* (RC) 



:|t * :|c 4c 4t :i4c 3|t:4i * ^ ***** 't'. ******** * 

* ' Documerrts^-a^quired by ERIC include many informal unpublished 

* materials not avaaTtablefrom other sources. ERIC makes every effort 

* to obtain the best copyaYa-ilable* Nevertheless, items of marginal 

* reproducibility are often encou5rtered_and this affects the quality 

* cf the microfiche and hardcopy reprodu(rE"i:ons^J|RIC makes available^ 

* via the ERIC Document Reproduction Service (EDRSr^-^EIRS is not 

* responsible for the quality of the original document* ^lfepro4uct ions 

* supplied by EDRS are the best that can be made from the originals- 



*:|t :|t:|t:|t4c*4t***************^*?$c*** ************* ***3i(:»t:»t:fc:|t:^:»t:9[:|c:»t 



EQl/lATING READING TESTS 
WITH THE RASCH MODEL 

/ - ■ ' 

Volume I, Final Report 



SCOPE \ - 




U S OEFAKTMENT OF HEALTH, 
EDUCATION A WCUFAWC 
NATIONAL INSTITUTE OF 
EDUCATION 

THIS DOCUMENT HAS BEEN SEPRO' 
DUCED EXACTLY AS RECEIVED FRDM 
THE PERSON DR ORGAN UATIONDRIGIN- 
ATINGIT POmrS DFVIEWDR DPINrONS 
STATED DO NDT NECESSARILY REPRE- 
SENT OFFICIAL NATIONAL tMSTlTUTE OF 
EDUCATION POSITION OR POLICY 



.EDUCATIONAL RESEARCH LABORATORY 
College of Education 
University of Georgia 
Athens, Georgia 30602 



ERIC 




EQUATING READING TEST^,WITH THE RASCH MODEL 



FINAL REPORT 
\ 



R. Robert Rentz 
W.%L. Bashaw 



\ 



V with the assistance of 

\, Carolyn Cartledge ' 

i S. Leellen ^rigman 



Educational Research Laboratory 



College of Education 
University of Georgia 
Athens, Georgia 

September, 1975 



This report was prepared under Contract No, OEC-0-72-5237 by the 
Educational Research Laboratory of the University of Georgia for 
the National Center for Education>§tati sties. Division of Educa- 
tion, U. S. Department of-Health, hducation, and Welfare. The 
contractor was encouraged" to exercise professional judgment in 
its contents; therefore, this report does not necessarily re- 
flect positions and policies of the Government. 



ERLC 



3 



Table of Contents 

. Page 



Chapter 1: Introduction 1 

1.1 ^ Objectives 2 

1.2 / Background and Significance . . . / 3 

L3 The Data Base 4 

1.4' A Frame of Reference for The Rasch Model 8 



Chapter 2: Evaluating Fit to the Rasch Model 12 

^2.1 The Problem of Model -Data Fit 12 

2.2 Stability of Parameter Estimates ■ . 18 

2.3 Stability as a Function of Sample Size 19 

2.4 Stability over Occurrence in the Design . . . . r . . . 30 

2.5 Stability as a Function of Sample Composition ..... 34 

2.6 Describing Test Fit 40 

2.7^ Relationship of Test Fit Indexes 46 



Chapter 3: Equating Methodology 49 

3/1 ^ General Principles - 49 

3.2 Estimating Equating Constants: Methodology 

and Results 52 

3.3 Raw Score Equating 60 

3.4 Error Problems . . 60 

3.5 Vocabulary Constant Error Variance 65 

3.6 Parallel Forms Constants Error Variance 67 

3.7 Comprehension Constant Error Variances 68 

3.8 Crude Estimates of Error Variances ^ 68 

3.9 Results of Applying Equations 70 

3 JO Comments on Errors 72 

3.11 Equating Errors for the Ability Method 72 



Chapter 4: Eguipercentile and the Rasch Model: A Comparison 

of the Results 74 

4.1 The Methodology 74 

4.2 , Data Organizations and the Presentations of the 

i Equating Results 76 

4.3 ' Comparison of the Results 77 

4.4 ' Concluding Comments 81 



' . ^ ii 

EKLC ^ 



Page 



Chapter 5: Summary and Conclusions 92 



5.1 Some Comments .\ 9Z 

5.2 The National Reference Scale for Reading 95 

5.3 Estimating National Reference Scale Scores 

frorn any Collection of Items 96 

5.4 Surranary > 97 



References 99 



Appendices 

Appendix A: S;tcibility of Parameter Estimates over ^ 

/uccurrence in the Design 101 

Appendix &. Stability of Ability Estimates as a 

/ Function of Sample Composition 130 

Appendix C: Equating Tables - Vocabulary 163 

\ Appendix D: Equating Tables - Comprehension 185 

Appendix E: Assignment Errors - Vocabulary 213 

Appendix F: Assignment Errors - Vocabulary 235 

Appendix G: FORTRAN Program for Producing NRS Scores 

for any Collection of Items 263 




iii 



1 



\ 



1.3.1 

1.3.2 

2.3.1 

2.3.2 

2.3.2 

2.3.3 

2.3.5 

2.3.6 

2.3.7 

2.3.8 

2.3.9 

2.4.1 

o 

2.4.2 



List of Tables 



Data Set^Numbers, Test Desription Codes; and 
Number of Items 



Sample Sizes for the Cells in the Equating 
Design Mairix 



Stability 



of Item Parameter Estimates as a 



Function of Sample Size (N=500) 



Stability of Item Parameter Estimates as a 
Function of- Sample Size (N=1000) 



Stability ^f Item Parameter Estimates as a 
Function of Sample Size (N=2000) .... 



Stability of Item Parameter Estimates as a 
Function olj Sample Size (N=4000) 

Stability of Ability Par'ameter Estimates as a 
Function -ofl Sample Size (N=500) 



Stability oV Ability Parameter Estimates as a 
Function of! Sample Size (N=1.000) 



Stability of Ability Parameter Estimates as a 
Function of Sample Size {N=2000) 



Stability' of Ability Parameter Estimates as a 
Function of. Sample Size (N=4000) 



Average Stability Indexes fvor STEP Il-Vocabulary 
for Calibration Situations Differing as a 
Function of Sample Size 



Stability In'dexes for Item Parameter Estimates for 
the 14 Primary Forms When the Tests Were Administered 
First and Second 



Stability Indexes for Ability Parameter Estimates for 
the 14 Primary Forms When the Tests Were Administered 
First and Second 



Page " 

6 
8 
21 
22 
23 
24 
25 
26 
27 
28 

29 

32 

33 



ERIC 



6 



IV 



ERIC 



Page 

2.5.1 .Sample Composition Variables, Codes, and 
Descriptions as Contained on the Anchor Test 
Study Data Tapes . . . ' 35 

2.5.2 Description of Sample Composition Analysis 
Groups for STEP Vocabulary 36 

2.5.3 Sample Sizes on Each Vocabulary Test for 
Sample Composition Analyses by' Race and IQ 38 

2.5.4 Ability Parameter Stability Indexfes for 14 
Tests Over Samples Differing in Compositions 39 

2.6.1 Slope Indexes of Fit and Frequency Distributions 
of SI opt Values for the Primary Form Vocabulary Tests . . 42 

2.6.2 Average Item Mean Square Fit Values for the 14 
Primary Form Vocabulary Tests\^ ^ 44 

\ 

2.6.3 Adjusted Median Mean Square Fit\Values for 
All Tests in the Data Base . . A 1 45 

\ 1 

2.7.1 Sufumary of the Various Descriptive Nlndex of 

Test Fit for the 14 Primary Form Voc'hbulary Tests .... 48 

3.2.1 Fourth Grade Vocabulary Difference Matrix 54 

3.2.2 Fifth Grade Vocabulary Difference Matrix . . . . 54 

3.2.3 Sixth Grade Vocabulary Difference Matrix 55 

. 3.2.4 Fourth Grade Comprehension Difference Matrix 55 

3.2.5 Fifth Grade Comprehension Difference Matrix 56 

3.2.6 Sixth Grade Comprehension Difference Matrix 56 

3.2.7 A Set of Equating Constants for Vocabulary 58 

3.2.8 A Set of Equattng Constants for Comprehension 58 

3.2.9 Equating Constants Recommended for Use ' 61 

\ ^ i 62 

3.9.1 Vocabulary Equating Constants Standard Errors 71 

4.3.1 Base Test: SAT Int. I Form W 
Equated Test: ITBS Level 10 Form 5 83 

4.3.2 Base Test: MAT Level E Form F 
Equated Test: SAT Int. I Form W 84 

7 

V 



\ 



page 

4.3.3 Base Test: MAT Level E Form F 

Equated Test: SRA Blue Form E ', 

4.3.4 Base Test: SAT Int. II Form W o-. 
Equated Test: MAT Int. Form F . 



4.3.5 Base Test: SAT Int.. II Form W 

Equated Test: MAT Int. Form F 87 

4.3.6 Base Test^ HAT Int. Form F 

Equated Test: SAT Tnt. II Form W . .- 88 

4.3.7 Base Test: MAT Int. Form F ' " 

Equated Test: SAT Int. 11 Form W . . . 89 

4.3.8 Base" Test: SRA Green Form W 

Equated Test: CAT Level 4 Form A >90- 

4.3.9 Base ^st: CAT Level 4 Form A 

Equated Test: ITBS Level 12 Form 5 91 

•5.1.1 A Comparison of Various Error Sources for a Raw Score 

of 19 on STEP II Level 3 Form A Vocabulary 94 



ERIC 



8 

VI 



\ 

Preface ^ 



The present two volumes represent the major output of a project 
designed to discover whether or not the Rasch Model has any utility in 
the equating of pre-existing tests. Essentially, we reanalyzed the 
data from the equating phase of the Anchor Test Study. We believe 
that in terms of Rasch Model research, this study is the largest 
(number of tests included and sample size) to date where attempts have 
been made to use* the modeV outside of a test construction context. We 
have, however, provided some guidelines for those interested in test 
construction. ^ c ^ 

There are many aspects of the data that are not fufly exploited 
in these reports. We have tried to write these volumes from a rather 
narmrperspective when dealing with the Rasch Model; yet, we have tried 
to achieve some degree of comprehensiveness , with respect to the equating 
process. There are many things yet to be learned and much left to do. 
If some of the ideas presented here spark interest, we invite you to 
join us. 

A number of people have contributed to this effort. Tirst and 
foremost. Dr. Charles H. Hamnier, our U. S. 0. E. project officer, has 
been most patient and helpful. He has aWays exhibited a high degree 
of professionalism. 

Our respect and appreciation go to Dr. Peter Loret and his colleagues 
at ETS who produced the Anchor Test Study. They have made a >• significant 
contribution. When we received the data from Peter on computer tapes, 
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each tape contained exactly the /information he said it contained, and 

/ 

each record contained all thfe fnformation that was supposed to be there. 
Those of you with some data processing experience know that this is 
not a small -achievement. / 

The project funding e/abled us to secure the consultant services 
of Professor Benjamin D./Viright and ProJ^e^sor Georg*Rasch. The ten 
days they spent with us/ were productive and insightful. Georg R^sch 
is a remarkable man. ,'He his en^e^^y, enthusiasm and a respectful 
attitude toward data/ Ben Wiy^t continues to assist us in our work. 
Many of the specif iyC procedures used here, especially with regard to 

equating, were suggested by him. His intellectual contribution to this 

/ 

Study was invalu/ible. 

Finallyi v/e wish to thank the staff of the Educational Research 
Laboratory, ^^any of them have made contributions and provided assistance. 
Donna Wortley typed the manuscript many versions and is happy tha^t 
we finally /decided what to say. , She has made our job much easier 

than it C(5)uld have been. 

/ / 
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Chaptq^r 1 
Introduction 

In 1960 Georg Rasch published. a book, Probabilistic Models for 
Some Intelligence and Attainment Tests , in which he described several 
mathematical models for representing responses to test questions* One of 
these models, which Ra^ch calls the simple item analysis model , has 
become popularly known as "the Rasch Model" J - 

While a smattering of research on the Rasch Model appeared in this 
country between 1960 and 1967, it was not until the 1967 ETS Invitational 
Conference on Testing that the interest o/ the American measurement 
community was stirred. Professor Benjamin Wright's paper on "Sample-free 

\ 

Test Calibration and Person Measuremenr*, presented at that conference, 
has probably served to popularize the Rasch fsSWel more than any o^her 
work. Research dated since the Wright. paper now numbers well over 300 
papers. Proponents of the Rasch Model advocate its use in test develop- 
ment largely on the basis that the model pVomises to achieve two impor- 
tant consequences long deemed desirable by psychometricians: (1) item , 
calibrations that are independent of tlQe calibrating sample; and- 
(2) person measurement that is independent of a specific set of items. 
There are several implications of these consequences: (1) Any 



^ Throughout this paper we will employ the convention of referring 
to Rasch' s simple item analysis model as the "Rasch Model". 
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appropriate cpllection of individuals can'^be used in the calibration 
process as opposed' to resorting to some elaborate sampling plan; (2) 
given a pool of calibrated items, any subset of that pool can be used 
to measure an individual. All such subsets will estimate "abil ity on 
a conmion scale. This latter condition greatly simp,l'ifies the problem 
of equating different tests and the former condition renders obsolete 
the special attention given to sampling plans when, for example, , 
achievement tests' are nationaily standardized.^ If different collections 
of items (e.g., different tests) can be used to make measurements on a ' ^ 
common scale, *then the fundamental problem of test equating is solved. . 

It is the primary purpose of this research to investigate the use 
of the Rasch Model for equating reading, achievement tests, specifically 
those reading tests used in the Anchor Test Study (ATS) (Loret, Seder",, 
Bianchini & Vale, 1974). A number of objectives guided the investigation, 
of the reanalysis of the Anchor Test Study. These objectives are 
sumniarized here and elaborated upon in subsequent sections of this report. 

1.1 Objectives ^ 

1. To descriflfe a methodology for test equating using the Rasch*model. 

2- To provide basic item analysis tiata for each test in the Anchor 
Test Study data base. 

3. To evaluate the fit of the Rasch Model with respect to those 
tests th^t were part of the data t?ase. 

4. To investigate the stability of Rasch Model parameter estimates 
under conditions of var^yihg sample size and sample composition. 



2 The above comments do not refer zo th§ collection of norms data but 
only that process used in typical data collection activities for item 
analysis and test equating purposes. 



5. To provide tables of equated scores based on Rasch Model methods • 

6. To estini^te the equating error associated with the use of the 
above equating methods*. 

7. To compare the results of equating with those obtained in the 
Anchor Test Study. 

1,2 Background and Significance 

Jaeger (1973) described quite well the motivation that gave rise 
to the support by the USOE of a study to equate several of the most 
commonly used reading achievement tests. His discussion of both the 
scientific and practical merits of the national test-equating study in 
rea^iing (called the Anchor Test Study) points to four areas in which the 
Anchor Test Study distinguishes itself: 

1) Its fulfillment of a long-standing objective of the; measurement 
^ community, i.e., the equating of widely used achievement tests 

in reading comprehension and vocabulary. 

2) - Its sgipe - it required adminsi strati on* of nearly 500,000 
^r.f^^f^nnn rnmnrehensmt} and vocafeulary tests^' to over 300,000 

children in 1,650 elementary schools in all 50 states.- 

3) Its widespread- support - the study ^carried the endorsement of 
^ the U. S. Cortimissione'r of Education, 49 of the nations's 

(jhief State'School Officers, and district superintendents and * 
principals representing more than 1,600 schools in all 50 states. 

> 

4) Its quality - ^it provides new national norms for the achievement ^ 
tests used, based on an unp^receden^ad school cooperation rate 

cjf ovdr 90 percent and a sample more nearly representative of 
cjhildren enrolled in U.^'$. public and private elementary 
schools than ever before achieved^ . 

The reanalysis of the Anchor Test Study data by Rasch Model techniques 

is the first in a potential series of studies^that will utilize the 

Anchor Test ^Study* data ^}}ase-4n-^ attempt to extend our knowledge about 

test equating in general. .If, for example. It can be shown that Rasch 

Model procedures can be used to equate tests like those used in the 

Anchor Test Study, then considerable savings might be realized in future 



equating efforts. Extensive data collection activities, elaborate 
sampling plans and sophisticated analyses are expensive to execute, 
yet these are the very elements for which the Anchor Study should be - 
credited. Unfortunately, the merits of the Anchor Test Study are also" 
the features that make such a study impractical for most organizations. 

Of more specific interest here is the issue of whether or not Rasch 
Model methods can be used with existing tests for test equating purposes.,- 
In theory, equating with the Rasch Model is a simple and straightforwarcf 
concept, requiring only that student performance on both tests to be 
equated can be reasonably described by the Rasch Model. Thus, an* 
important aspect of this study is, the degree to which the tests used can 
be considered to satisfy those conditions that allow appropriate use 
of the Rasch Model. The issues of model-data fit are not simple. Compared 
with the mechanics of item and test calibration and test equating, the 
procedures for evaluating fit are much more elusive. 

It seems quite clear at present that tests can be constructed to 
the specifications of the Rasch Model. It is not at all clear what the 
limits of utility are for existing, intact tests when those tests are 
analyzed by Rasch Model procedures. In other words, to what extent 
will the consequences of the model be achieved under conditions of less 
than ideal correspondence with model specifications? < | 

1.3 The Data Base ^ 

The data that were used in this study were .collected specifically 
for the Equating Phase of the Anchor Test Study. The purpose of the 
Anchor Test Study was to provide a method sfor translation of a score on 
one of seven widely used standardized reading tests to a score on any 

\ ., ' 14, 



of the other ^^sts. 

The United\^Std*es Office of Education (USOE) initially determined 
seven or the more frequently used reading achievelment tests appropriate 
for grades 4, 5, and 6. Two forms of each test were chosen, a primary 
form (the one most frequently used) and its secondary (alternate) form. 
Each reading test could Ue divided into a vocabulary subtest and a 
comprehension subtest. Equating was independently performed for each 
subtest and the total test at each of the three grade levels. This 
required the administration of appropriate pairs of reading achievement 
tests to fourth, fifth, and sixth grade students randomly selected from 
public and non-public schools in the United States. 

In thejstudy reported here, the seven tests were considered as 
twenty-eight tests by separating them into their various forms and 
levels. (The test batteries and the various forms and levels are shown 
in Table 1.3.1.) - In this scheme of data organization, the particular. 
grade(s),in which a test was administered was disregarded. With STEP, 
having' only one level, ITBS having three levels, and the other five 
batterifis having two levels, there are fourteen primary and secondary 
forms that can be identified. Thus, the Rasch Study results are presented 
with eaich of the fourteen primary forms us.ed as a base test with its 
secondary form and the other thirteen primary forms being equated to it. 
Inerefore, only fourteen equating tables are necessary, each showing the 



^ ^ After the inception of the study herein reported, an eighth test 
"was added to the ATS data base; that test was hot available at the time 
this study. was begun and has not been incllided here. 



TABLE 1.3.1 

Data Set Numbers, Test Description Codes, and 
Number of Items. 



Test Description Codes Number of Items 

DSN No. Test Name Form Level Grade Vocabulary Comprehension 



01 


CAT 


A 


3 


^,5 


40 




42 


02 


CAT ' 


B 


3 


^,5 


40 




42 


03 


CAT 


A 


4 


6 


40 




45 


04 


CAT 


B 


4 


6 


. 40 




45 


05 


CTBS 


Q 


2 


A, 5 


40 




45 


06 


CTBS 


R 


2 




40 




45 


07 


CTBS 


Q - 


3 


6 


40 




. 45 


08 


CTBS 


R 


3 


6 


40 




45 


09 


ITBS . 


5 


10" 


4 


38 




68 


10 


ITBS :. 


' 6 


10 


4 


38 


/ 


68 


11 


ITBS 


' -5 


11 


5 


43 




74 


12 


ITBS 


6 


11 


5 


43 - 




74 ' 


13 


ITBS 


5 ' 


12 


6 


46 




- 76 


14 


ITBS 


6 


12 


6 


/ .46 




76 


15 


MAT 


F 


ELE. 


4 


50 




>5 


16 


MAT 


G 


ELE. 


4 


50 




45 


17 


MAT 


F 


TNT. 


5,6 


50 




45 


18 


Mj\T 


G 


INT . 


5,6 


50 




45 


19 


STEP II 


(\ 


4 


4,5,6 


30 




30 


20 


STEP II 


?> 


4 


4,5,6 


30 


J 


30 


21 


. SKA 


E 


BLUE 


4,5 


42 




48 


22 


SRA 


F 


BLUE 


4,5 


42 




48 


?3 


SRA 


E 


GREEN 


■6 


. 42 




. 48 


24 


SRA 


F 


GREEN 


6 


42 




48 


25 


^SAT 


W 


INT. I 


4 


38 




60 


26 


SAT 


X 


INT. I 


4 


38 




60 ^ 


27 


SAT 


W 


INT. II 


5,6 


48 




64 " 


28 ' 


SAT 


X 


INT. II 


5,6 


48 




64 



* All Items contain four alternatives except for items 35 through A2 in CAT Level 
3 Form A and tAT Level 3 Form IJ and items 35 through 45 in CAT Level 4 Form A 
and CAT Level 4 Form B, which contain five. 



raw score? for the, base test plus the equated raw scores for its 
secondary form and the primary forms of the other thirteen tests* 

The target population of subjects for the ATS was all fourth, fifth, 
and sixth grade students ir^ the United States who would not be limited 
in taking the tests because^ of a physical or mental handicap or who did 
not know English; therefore, a Sampling Phase was necessary to establish 
a national probability sample of grades 4, 5, and 6 to establish equating 
relationships among the seven tests. 

The Rasch Study required a reorganization of the data base that was 
used for the ATS. Two methqds of grouping data comprised the project 
organization with all data grouped without regard to the grade of the 
examinees. First, a Basic File was created by grouping together all 
test data for a single test disregarding both pairing and order of 
administration. This created 28 subfiles in the Basic File* This file 
\vas used in the test calibration phase of the project, for item analysis, 

and to stody the model and^its fit. 

/ 

The second file, the ^Paired File, was created by grouping all 
independent test pairs without regard to the grade of the students. 
This grouping method yielded 136 subfiles of independent test pairs 
as is shown by the sample sizes in Table 1.3,.2. Each entry in this 
* design matrix represents that particular test pair identified by the 
row and column index "DSN" (data set' number)* The row index identifies 
the test administered first in the sequence. Notice that all indexes 
are odd numbers. This v/as a processing convenience since there was some 
advantage to k'eeping a test's primary and secondary form together in the 
organizational sequence. Whenever the test was administered with its 
, alternate (secondary) form, it appears on the diagonal, which sho^fs two 



TABLE 

SAKPLE SIZES FOR THE CELLS IN THE EQUATING DESIGN' MATRIX 



DSN 

KO. 01 03. 05 07 09 11 13 15 17 19 21 23 25 27 



01 J^^^ 1986 775 821 9A3 9A0 1352 1A35 729 800 
990 

03 10A8 836 832 696 723 821 

05 1831 ^5^3 608 736 677 695 18A2 16A2 831 699 

07 80A l-^l 628 699 857 927 \ 782 

09 682 $97 JqjJ 752 680 623 823 

11 678 626 750 606 6A1 3A7 

- lOol * ,\ 



13 6AA 700 1131 



'i 



10A7 



592 612 889 875 



15 9A8 61A 717 ^75 

17 95A 911 65A 707 784 6A1 ' 196A 913 889 1A31 

19 1A92 693 1685 775 550 665 638 986 1855 2225 ^^^^ ^^^^ ^^^^ 



1732 



21 iA29 1658 616 7A7 915 805 1668 ^^^^g 808 819 



V 



705 



23 77A 9A8 971 952 912 ^ 



r 



25 876 756 916 58A 789 75A 

27 908 836 799* 709 819 90A 12A1 1622 870 969 



631 
630 



1331 
1472 



18 



i 
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sample sizes. These numbers distinguish between the order of administra- 
tion of primary and secondary forms. The first number is the sample 
^ size for the primary form administered first the second number is for the 

primary form administered second. The Paired File was used to estimate 
the equating constants (see Chapter 3) and to study model -data fit^ ^ 

^A A Frame of Reference for the Rasch Model 

Georg Rasch (1960) proposed several models for achievement measures. 
We are concerned with only his "simple item analysis model" which is 
appropriate for a measure composed of a set of questions, scored correct 
or' incorrect. \ 

I 

Rasch (1973) proposed that al^ measurement be con^structed with a 
specific "frame of reference." That is, one should c^reful^ly consider 
the population of persons to be measured (what Wright '^calls the 
"target") and the domain of tasks that defined the trait to be assessed. 
Rasch (1966) proposed that one should create measurements such that ^ 
comparisons of persons in the target population or comparisons of tasks 
in the task domain should be invariant with respect to the specific^ 
sample of task^ or persons that one observes. This invar iance property 
was called "specific l)bjectivity'* to show that the invariance is limited 
to the specific frame. of reference. His work led to an important 
conclusion; namely, that there exists one model for' tests that is both 
necessary and sufficient for yielding specific jobjectivity. (Rasch, 1966; 
Schmidt, 1970). 

There are only two sets of parameters in the model— one for persons 
(abilities) and one for items ^easiness). The introduction of other 
parameters will lead to other models and; thus, potentially to a loss of 
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specifie objectivity. 

Let a particular person be characterized by an ability parameter 
£ and a particular ftem be characterized by an easiness parameter e. 
Then the odds for the person responding correctly to the item is Ce. 
This is the simplest form of the model. It can be rewritten as the 
probability statement ^e/d + Ce). 

If one considers all persons, tha^is, if c is considered a variabl 
then the simple model is a model for the\i tern characteristic curve of 
the item. There are several well-knof/n m)dels for item characteristic 



curves. Specifically, the normal ogi^/e a 



id the logistic function are 



often used, wherein there are two or thVe^ item parameters (easiness. 



discrimination^ and guessing parameters). 
Novick (1968) for a review of normal ogiv 



For example, see Lord and 
and logistic function models. 



It can be shown that Rasch's simple niodejl is a logistic function with 



only item parameter, the easiness^, parameter. Thus, the literature on 
item characteristic curves is rilevaht t6 the study of Rasch's Model. 
The estimation of the twd sets of parameters in the Rasch Model is 



called "test calibration". Test calibration consists of obtaining two 
sets of infjormation. One is the easVneVs'^estTma^^^^^^ 
second is' a table giving an ability estimate corresponding to each 
possible raw score. A procedure for obtaining these estimates is the 
unconditional maximum lijcelihood procedure. The maximum likelihood 
estimates are generated by MESAMAX, a computer program based on a 
paper by Wright and Panchapakesan (1969). Volume II of this report 
gives the .cal ibrations for items and abilities for all tests. Volume 
II also presents for comparison some traditional item analyses results. 
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The model and its properties are presenteci mere full/ in Chapter 2, 
There, it is shown how the tnodel leads to test construcxion g'Midelines 
and to gui.delines for assessing the degree to which^the model, is 
appro5y;^'ate for various ;Sets^of' data . 'v. 
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Chapter 2 • 

Evaluating Fit To T>ie Rasch Model 

\ 

f 

The issue of whether -^or not a particular test fits the Rasch Model 
is basic to the utility^of the model and the attainment of the consequences 
which the model promises to achieve. The problem of model fit is not at 
all simple. The concept of testing or evaluating fit almost always implies 

the examination of some appropriate set of data. Part of the problem 

\ 

of evaluating fit is determining which data are\ appropriate for use in 
the evaluation process. Should one deal with the fit of each individual . 

'item or the test^as a whole? Does a small proportion of nonfitting items 

\ 

prohibit the use of the model for a particular test? How much misbehavior 
in the data will the model tolerate? Each of thelse questions implies 
some concern for evaluating mo.del-data f-it. 

) . - \ , 

2>1 The Problem of Model-Data Fit 

It is the thesis here that there exists two rather funaamen^ally 
•different types of , applications of the pasch Model that call for corres- 
ponSingly different concepts of model-data fit. The two types of 
applications will be called test construction and test analysis and the 
corresponding"concepts of fit will be referred to as item fit for the 
former situation and test fit for the latter. The purpose of this 
chapter is to define these ..application situations and their respective 
concepts of fit, and to illustrate the notion of test fit by.-applying 
its principles to the reading achievement tests. 
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The primary difference between the situations referred to as test 
construction and test analysis is the freedom to manipulate the test at 
the item level. In the case- of test construction , the Rasch Model can 
- be used as- a gurde,- or bliieprint^^-for— the selection of those items that 
will compose the test. The attention of the test maker is on developing 
or finding a set of items that in some acceptable seTise can be sa.id to fit* 
tKe model. The test maker has the freedom at this item analysis juncture 
of his task to discard poorly fitting items, retain good items, and modify 
other items as needed. Thus, for this application, indicators of model- 
data' fit are necessary for items , the presumption being that the final 
collection of items will include only those that meet whatever criteria 
for fit might be established: 

In the situation called test analysis , the particular collection of 
test items is fixed. There is no freedom to discard .poor items. Rather, 
the objective in this case is to derive whatever benefits the model is 
robust enough. to provide under potentially less than ideal item fit 
conditions. The chances are quite gbod that some proportion of items in 
the given test would not have met a criterion for:, "fit" had the items 
been ev^l ua ted ^ during test construction. The extent to which items 
defined as nonf^itting, on the basis bf item fit indicators, can b? 
allQwed to contaminate a collection of fitting items is, of course, a 
matter for investigation, ^he fact that the model has been shown ta be 
robust (i.e., the model tolerates some leniency with respect to strict 
adherence to its assumptions) when some poor fitting items were present 
(Panchapakesan, 1969, C. Rentz, .1975), lends credence to the notion that , 
the area of test analysis might be a fruitful area to explore. This is , 
precisely the situation that gave rise to the investigation on which we 
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are now reporting. 

,' Georg Rasch (1966) proposed what he calls the simple item analysis 
model as a way of achieving a desirable measurement principle called 
specific objectivity . Rasch has more formally defined specific objectivity 
as "... whenever the conparison of any two parameters within the same set 
m.ay be carried out in such a way that it (the comparison) is unaffected 
by all other unknown parameters in the system ... the comparison is 
characterized as 'specifically objective'." One^-jmplicatibn of this 
principle, for example, is that.-the difference between the parameters 
of any two items is invariant with respect to the particular ability 
parameters of a particular sample. This ,has the same implication a? 
Wright's (1968) phrase "sample free item calibration." 

Actually, specific objectivity is an integral and formal part of 
Rasch 's model when the model is stated in the general form of the ' 
hypothetical or IF-THEN statement, where the IF-clause represents 
assumptions and the THEN-clause specifies the consequences,. Figure 2.1.1- 
symbolizes this structure and shows the formal relationship between 
Rasch's (1966) three assumptions and specifice objectivity as the 
consequence. 

It is interesting to note that many writers represent the Rasch 
Model by reference to assumptions 1 and 2, while our own preference is 
for representation based on the entire structure. The advantage of such 
a representation is that it permits some clarity to be introduced into 
such questions as: What are the assumptions of the model? What are its 
consequences? And, what is the difference between these and deductions 
derived from them? For example, conditions that are commonly identified 
as necessary for model fit are unidimensionality of the trait being 
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Figure 2.1.1: Rasch's Item Analysis Model ;^ 



IF 



1 . 



2- 'ij-'fi 



3. stochastic independence 



, THEN 



specific 
objectivity 







The three assumptions, only symbolized above, may be stated more 
fully as follows (adapted from Rasch, 1966,nra"'*^50.U.^.-> 

1. To each situation in which a person j (j = 1, 2 ... n) is to 
answer , an item i (i - 1, 2 ... m) there is a corresponding 
probability of a correct answer (X.. = 1) which may be written 



P(Xij.r 1) =• 



(X, . > 0) 



2. The situation parameter X.. is tIRe product of two factors, 



where pertains to the person and e. to the item; these 

parameters have been called respectively item easiness and 
person ability. 



3. Given the values of the parameters, all answers are stochasti- 
cally independent. 
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measured,- equal item di scrip'.inations» and the absence of guessfng. These 
conditions are not assumptions , but they can be easily deduced from the 
assumptions. For exaniple, since assuq)tion 2 specifies only one item 
parameter and one pers0n parameter, it is clear that "variation" in any 
'other characteristic, of either items or persons can not be permitted. 
Thus, some of those traditional constructs suph as item discrimination - 
and guessing have not been parameterized. It i? perhaps more proper 
to identify these and other such deductions ffom^the assumptions as 

antecedent con ditions , sinc.e they are derived from the antecedent clause 

— (.^ 

of the logical if-the p statement. 

Similarly, it "is Possible to deduce from the model's consequence 

> ■, . 

component,' that is. specific objectivity, certain other conditions 
variously referred to a,s iiPpl i cations, o.jtcomes, and 'consequences,, such 
as Wright's phrase "sample free item calibration." All such logical 
deductions from the model,' s consequences will be referred to as 
consequent conditions , for ;^easons corresponding to those previously 
stated, since they are derived from .the consequence clause. 

The differences between the model's assumptions and consequences 
^\md the conditions v-^hich they imply is that both the antecedent and 
consequent 'cDRdlitvons are data related r^they enable us to translate 
the fonnal statement o>the model into constructs that are a bit more 
operational. These conditions ar-e, operational whereas the model and its 
elements are merely psychon;ctric symbols. Evaluation of model-data fit 
must deal at the level of the antecedent and consequent conditions since 
it is at this level that .data can be mustered for the evaluation. A 
difference between the modem assuT.ptior.s and i£s consequences is that 
the assumptions deal with items and the consequences imply setsX ^ 
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items or teSts. Thus, antecedent conditions are most likely to lead to 
indicators of iteiD fit; whereas, consequent conditions might be most 
useful in describing test fij:. 

These relationships lead to one way of defining item and test fit* 
^ Item fit can be. defined as the extent to which items can be characterized 
according to those antecedent conditid)i$ derived from the model's 
assumptions. Test fit can be defined as the extent to which the test 
achieves those consequences specifiable from the .concept of specific 
objectivity. rr . ^ . ^ 

Test fit might also be defined as the extent to which the test 
contains fitting items, for example, in terms of a proportion of i.tems 
that fit the model, using some specified criterion of item fit. Thus, 
two general approaches to the concept of test fit seem to be attractive; 
- the first based bn the test's achieving specified consequent conditions 
and the second based on the test's item composition. It might well turn 
out that different methods of evaluating fit will be called for depending 
on the particular problem area application. Yet, scientifically we ought 
to expect some decree of convergence among the different ways of examining 
presumably the same thing. Thus, tp\s issue is examined to some eUent 
in the pr'esent work. 

In the present project we have taken the position that the consequent 
condition approach to test fit is the most relevant consequence for equating 
The specific consequent condition is the stability of Rasch ability 
parameter estimates. The ability parameters of the model are supposed 
to be invariant with respect to any other person parameters. Consequently, 
an examination of this consequent condition should help us evaluate that 
aspect of the degree of model-data fit that is most relevant and, as 
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such, provide information on the usefulness of the model for equating 
those reading tests in the present sample. 

Our definition of equating involves only the raw scores and those 
ability values estimated for them. Thus, given a cornnon scale and as 
long as the values in the ^'scoring table" (i.e., the set of all raw score 
and ability estimates) remain constant for two tests, the equating results 
cannot vary. This means that systematic variation in calibration condi- 
tions (such as race, sex, grade) is inconsequential as long as "scoring 
tables" remain invariant.^ This invariance or s^tabilUy of ability 
estimates is the most direct measure of the Model's usefulness for 
equating; however, an examination of item stability as well as examina- 
tion of certain antecedent conditions should also provide useful infor- 
ma^iion to eval':ate model -data fit. 

2.2^^ .Stability of Parame ter Estimates 

In order to study how various factors influence the stability of 
estimateSv' it^ is necessary to define a measure of stability that can be 
compared across different tests and across different analyses of the 
same test. The word "stability" implies that a set of estimates of the 
same parameter (i.e., for a single item or single raw score) will be 
invariant over repeated observations. , The ordinary standard deviation 

can be used as a measure of this invariance. Specifically, whenever 

-t \ 



^ There are applications where item stabiVdty yvould be significantly 
more important. One such application is tailored testing, in v/hich there 
is an attempr to match items to people, and sinceXmisfit affects the 
stability of items, more than it does the abilities;, a higher degree 
of fit would be required than t,hat necessary for applications requiring 
•only stable ability estimates. . \ 
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we have multiple occasions to estimate an item paramelfer, the standard 
deviation of the distribution of estimates can be used to describe the 
stability of that item's estimate. If we want to Sliimiarize stability for 

o 

a test, with respect to the items, then the average of these standard 
devia1;;ions will do quite well. Similar procedures can be used for the 
ability estimates. This average of standard deviations we will call a 
"stability index" and jse it as a measure of Invariance throughout this 
report. The index takes low values when stability in high and increases 
as stability decreases. ^ . ^ 

It is useful, at this point to remind the reader that throughout this 
study we have reported the parameter estimates in the natural log units 
in which they are traditionally reported. Thus, both item easiness and 
person ability are measured on a common scale'; comparisons of the stability 
of these estimates are consequently appropriate. 

2.3 Stability as a Function of Sample Size 

ahe purpose of this analysis was to determine the effect of sample 
size on the stability pf Rasch parameter estimates,. This issue provides 
a good point of departure for studies of stability since it provides some 
information on the expected variability of the easiness and ability 
parameter estima^tes, over random replication of samples of the same size, 
vnth tests composed of items typical of those in the present project. 

STEP vocabulary was used in thi.s analysis since there were more ^ 
observations on it (N = 33, 123) than on my other test and those obser- 
vations spanned all three grades. Fifteen random samplesiwere drawn 
from the STEP data file for each of four sample sizes: 500, 1000, 2,000, 
4,000. Each of these sets of data were then analyzed by.Jt^ie MESAMAX item 
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analysis program, and summary statistics were computed over the 15 repli- 
cations of each sample size. 

Table 2.3.1 through 2. 3*5 show the results of this analysis for the 
item easiness parameter estimates and Tables. 2. 3. 5 througn 2.3.8 show the 
results for the ability parameter estimates. Each table contains either 
the item number or the score group number and the mean, standard deviation 
'maximum estimate, minimum estimate, and range computed over the 15 repli- 
cations of that particular sample size. 

' Of pai cicular interest' is the standard deviations listed in these 
tables. Tht / provide an index of the stability of a particular parameter 
estimate and ^.an be compared across analyses.^ In general the results 
show that^the stability mea-sures improve with sample size. Table 2.3.9 
provides this comparison in summary form. It shows the mean of these 
"stability, jndexes" (standard deviations) computed over the 30 it^ms and 
29 score ^groups . 

Table 2^3.^ shows that the ability estimates are more stable than 
the it6m eastne^ estimates. There is some tendency for the stability 
Of the ability estimates to get better with increases in the size of the 
calibrating sample! The fact that easiness estimates are more sensitive 
to different sample sizes than are ability estimates is not unexpected 
nor mysterious. The basic observation for estimating easiness is, £, 
the proportion answering the item correct, a number whose accuracy depends 
directly on the sample size. The stability of ability estimates depends 
on both the item easiness estimates and the number of items. Thus the 
extent of sample size influence on ability estimates is limited by its 
i-nfluence on item easiness. Furthermore the influence of item easiness 
variability tends to attenuate as the number of items become greater. 
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' TABLE 2.3.L 
^ Stability of Item Parameter Estimates 





as a 


Function of Sam^^le Size (N=500) 
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IflEK NO 

f . 

1 

2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26. 
27 
28 
29 
30 



TABLE 2.3.2 
Stability of Item Parameter Estimates 
as a Function of Sample Size (N=10Q0) 
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TABLE 2.3.3 
Stability of Item Parameter Estimates 
as a Function of Sample SiTe" (N=2000X 
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0.2660 


30 


-2.2390 


0.0648 




-2.1440 


-2.388Q 


, - 0 . 2440 
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TABLE l.Z.'h 
Stability oi^"^Item Parameter Estimates 
a's a~^Function of Sample Size (N=4000) 



ITEM NO 
1 

r, 2 

3 
A 
5 
6 
7 
8 
9 

10 
11 
12 
^ 13 
lA 
15 
16 
17 
18 
19 
20 
21 
; 22 
23 
2A 
25 
26 
27 
28 
29 
30 



^^EAN 

1.7510 
2.3279 
2.03A0 
1.2581 
1.5039 
1.156A 
1.7319 
1.3023 
1.3083 

o.iise-- 

0.07A5 
0.5723 
0.8583 
0.532A 
0.2321 
-0.0A39 
0.5A01 
-0.1120 
-0.0A35' 
-1.60A6 
-1.AA05 
-0.9089 
-0.5311 
-1.5A06 
-1.6173 
-0.9,085 
-1-.9A12 
-1.9111 
-2.4565 
-2.259A 



' S.D. 

6.0535 

0.0971 

0.0569 

0.0697 

0.0393 

0.0522. 

0.0733 

0.0399- 

0.0592 

0.0A6A 

0.0581 

0.0A88 

0.0592 

0.0762 

0.0A17 

0.0A68 

0.06A9 

0.0A65 

0.0363 

0.0579 

0.0211 

0.0467 

0.0436 

0.0494 

0.0478 

0.0385 

0.0932 

0.0383 

0.0830 

0.0586 



MAXIMUM 

1.8360 
2.5200 
2.1470 
1.4910 
1.5950 
1.2580 
1.9690 
1.3980 
1,4040 
0.2260 
0.1320 
0.7030 
1.0190 
0.7430 
. 0.2870 
0.0570 
0.7280 
-0.0260 
0.0280 
-1.5250 
-1.3950 
-0.8460 
-0.A520 
-1.A700 
-1.5450. 
-0.8510' 
-1.8050 
-1.8690 
-2.3700 
-2.1750 



MINIMUM 

1.6280 
2.1860 
1.9150 
1.1890 
1.4370 
1.0790 
1.6500 
1.2260 
1.2210 
- 0.0580 
-0.0450 
0.5130 
0.7750 
0.4380 
0.1300 
-0.1200 
0.46AO 
-0.1980 
-0.1190 
-1.7A30 
-1.4770 
-1.0290 
-0.6390 
-1.69A0 
-1.7330 
-0.98.^0 
-2.2060 
-1.9900 
-2.7170 
-2.4200 



RANGE 

0.2080 „ 
0.33A0 " 
0.2320 
0.3020 
0.1580 
0.1790 
0.3190 
0.1720 
0.1830 
0.1680 
0.1770 
: 0.1900 
0.2440 
0.3050 
0.1570 
0.1770 
0.2640 
0.1720 
0.1470 
0.2180 
0.0820 
0.1830 
0.1870' 
0.2240 
0.1880 
0.1380 
0.4010 
0.1210 
0.3470 
0.2450 
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TABLE 2.3.5 
Stability of Ability Parameter Estimates 



as a Function of Sample Size (N=500) 



SCORE GROUP 


MEAN 


S.D. 


MAXIMUM 


MINIMUM 


RANGE 


1 


-'1.0903 


0.042.1 


-3.9880 


-4.1470 


0.1590 


2" 


-3.3171 


0 . 0388 


-3; 2240 


-3.3680 


0.1440 




-2.8299 


• 0.0357 


-2.7450 


-2.8760 


0.1310 




-2.4587 


0.0328 


-2.3820 


-2.5000 


0.1180 


5 


-2.1499 


0.0297 


-2.0810 


-2.1870 


0.1060 


6 


-1 .8796 


0.0266 


-1.8190 


-1.9120 


0.0930 


7 


-1.6350 " 


0.0239 


-1.5820 


-1.6630 


0.0810 


8 


-1.4085 


0.0211 


-1.3630 


-1.4330 


0.0700 


9 


-1.1949 


0.0184 


-1.1560 


.• -1.2150 


0.0590 


10 


-0.9901 


, 0.0158 


-0.9590 


-1.0070 


0.0480 


11 


-0.7923 


0.0133 


-0.7680 


-0.8090 


0.0410 


12 


-0.5991 


0.0111 


-0.5800 


-0.6150 


0.0350 


' 13 


-0.4085 


0.0092 


-0.3930 


-0.4240 


0.0310 


14 


-0.2193 


0.0078 


-0.2060 


-0.2340 


0.0280 


15 


-0.0304 


0.Q073 


-0.0200 


-0.0440 • 


0.0240 


16 


0. 1597 


■ 0.0078 


0.1730 


' 0.1480 


0.0250 


17 


0. 3521 


0.0091 


0.3680 


0.3360 


0.0320 


18 


0.5481 


O.OllO 


0.5660 


0.5250 


0.0410 


19 


0.7487 


0.0133 


0.7690 


O.J 190 


0.0500 


20 


0.9556 


0.0159 


0,9780 


0.9190 


0.0590 


21 


1. 1707 


0.0185 


1.1960 


1.1280 


'0.0680 


22 


1.3963 


0.0214 


1*4250 


1.'3470 


0.0780 


23 ' 


1.6353 


0.0242 


1.6660 


1.5800 


.0.0860 


2A 


1.8929 


0.0273 


1.9260 


1.8310 


'0.0950 


25 


2. 1762 


0.0302 


• 2^2110 


2.1090 


0.1020 


26 


,2.4981 


0.0333 


2.5350 


2.4250 


0.1100 


27 ■. 


' 2.8819 


0.0367 


2-.9250 


2.8030 


- 0.1220 


28 ■ 


3 . 3809 


0.0399 


3.4320 


3.2970 


0.1350 


29 , - 


4.1649 


0.0434 


4.2250 


4.0760 


0.1490 
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TABLE 2. 3. '6 

0 

StaUilicy of Ability Parameter Estimates • 

t 

as a Function of Sample Size (N=1000) 



SCORE GROUP 


MEAN 


S.D. 


MAXIMUM 


MINIMUM 


RANGE 


1 


-4.0946 


0.0308 


-4.0160 


-4.1390 


0.1230 


2 


-3.3205 


0.0279 


-3.2490 


-3.3600 


0.1110 


3 


-2.8323 


0.0252 


-2.7670 


-2'.8fr70 


0.1000 


4 


-2.4601 


0.0224 


-2.A010 


-2.4900 


0.0890 


5 


-2.1505 


0.0200 


-2.0970 


•. -2.1760 


0.0790 


6 


-1.8796 


• 0.0176 


-1.8320 


-1.9030 


0.0710 


7 


-1.6344 


0.0152 


-1.5930 


-1.6560 


0.0630 


8 • 


-1.4073 


0.0U3 


-1.3710 


-1.4280 


0.0570 


9 ■ 


-1.1933 


0.0111 


-1.1630 


-1.2120 


, 0.0490 


10 


-0.9884 


0.0094 


-0.9630 


-1.0050 


0.0420 


11 


-0.7903 :H- 


0.0078 


-0.7710 


' -0.8060 


■ 0.0350 


12 


-0.59&8" 


0.0066 


-0.5820 


-0.6100 


0.0280 


13 


-0.4063 


0.0060. 


-0.3970 


-0.4180 


. 0.0210 


14 


-0.2170 


0.0058 


-0.2060 


-0.2270 


0.0210 


15 


-0.0282 . 


0.0059 


-0.0170 


-0.0370 


0.0200 


16 


0.1618 


0.0066 


0.1740 


0.1530 


0.0210 


17 


0.3541 


0.0077 


0.3670 


0.3420 


0.0250 


. 18 


0.5496 


0.0087 


0.5640 


0 . 5320 


0.0320 


19 


0.7500 


0.0100 


0.7660 


0.-7270 


0.0390 


20 


0.9567 


0.0112 


0.9740 


0.9290 


0.0450 


21 


1.1713 


0.0127 


1.1900 


1.1380 


0.0520 


22 


1.3964 


0.0144 


1.A160 


1.3580 


0.0580 


23 


1.6351 


0.0161 


1.6560 ~ 


1.5910 


0.0650 


24 


1.8921 


. 0.0179 


1.9180 


1.8430 


0.0750 


25 


2.1752 


0.0198 


2.2060 


2.1210 


0.0850 


26 


2.4967 


0.0217 


2.5330 


2.4380 


0.0950 


27 


2.8803 


0.0236 


2.9220 


2.8180 


0.1040 


28 


3.3791 


0.0256 


3.A260 


3.3130 


0.1130 


• 29 


4.1633 


0.0278 


4.2160 


4.0940 


0.1220 
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TABLE 2.3.7 
Stability of Ability Parameter Estimates 
as a Function of Sample Size (N=2000) 



SCORE GROUP 

1 
2 
3 
4 
5 
6 
7 
8 
9 

10 
11 
12 
13 
14 
15 
16 
17 
18 
19 

20 • 

21 

22 

23 

24 

25 

26 

27 

28 

29 



MEAN 

-4.0838 
-3.3111 
-2.8241 
-2.4531 
-2.1446 
-1.8749 
-1.6306 
-1.4043 
-1. 1911 
-0.9870 
-0.7896 
-0.5968 
-0.4067. 
-0.2182 
-0.0297 
0.1599 
0.3517 

0.7470 
0.9532 
1.1677 
1.3925 
1.6308 
1.8877 
2.1705 
2.4915 
2.8746 
3.3732 
4.1567 



S.D. 

0-.0191 
0.0178 
0.0164' 
0.0154 
0.0140 
0.0127 
0.0113 
0.0099 
0,0087 
0';0076 
0.0065 
0.0056 
0.0048 
0.0046 
0.0047 
0051 
0055 
0063 
0072 
0.0083 
0.0094 
0.0107 
0.0116 
0.0130 
0.0140 
0.0150 
0.0161 
0.0169 
0.0179 



MAXIMUM 


MINIMUM 


RANjJE 


-4.0550 


-4.1200 


0.0650 


-3.2850 


-3.3450 


o.o6oa 


-2.8000 ' 


■ -2.8550 


0.0550 


-2.4310 


-2.4810 


0.0500 


-2.1240 


-2.1710 


0.0470 


-1.8560 


-1.9000 


0.0440 


-1.6140 


-1.6540 


0\0400 


-1.3900 


-1.4260 


0.0360 


-1,1780 


-1.2110 


0.0330 


-0.9760 


-1.0050 


0.0290 


-0.7800 


-0.8050 


0.0250 


-0.5890 


-0.6100 


0.0210 


-0.3990 


-0.4170 


0.0180 


-0.2100 


-0.2260 


0.0160 


-0.0220 


-0.0380 - 


0.0160 


0.1680 


0.1510 


0.0170 


0.36 10 


0.3430 


0,0180 


0.5580 


0.5370 


0.0210 


0.7600 


0.7360 


0.0240 


0.9680 


0.9410 


0.0270 


1.1840 


1.1550 


0.0290 


1.4120 


1.3780 


0.0340 


i:6540 


1.6140 


0.0400 


1.9150 


1.8680 


0.0470 


2.2010 


2.1490 


0.0520 


2.5^50 


2.4670 


0.0580 


2.9110 


2.8480 


0.0630 


S.lfllO 


3.3440 


0.0680 


4.1980 


4.1260 


0.0720 
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TABLE 2. 3"; 8 



Stability of Ability Parameter Estimates 



as a Function of Sample. Size (N=4000) 



SCORE GROUP 


MEAN 


S.D. \" 


MAXIMUM 


MINIMUM . 


RANGE 


1 


-4.0893 


0.0251 


-4.0590 


-4.1640 


.'0.1050 


2 


-3.3164 


0.0235. 


-3.2910 


-3.3890 


0.0980 


3 


-2.8293 


0.0221 , 


-2.8080 


-2.9000 


0.0920 


4 


. -2.4583 


\^ 0.0211 


t:\-2.4400 


-2.5270 


0.0870 


5 


-2.1497 


0.0201 


-2.1330 


-2.2170 


0.0840 


6 


-1.8795 


0.0190 


-1.8640 


-1.9440 


0.0800 


7 


-1.6353 


0.0181 ■ 


-1.6200 


-1^6970 


0.0770 


8 


-1.4088 


0.0170 


-1.3950 


-1.4670 


0.0720 


9 


-1.1951 - 


0.0159 


-1.1820 


-1.2500 


0.0680 


10 


-0.9909 


0.0145 


-0.9790 


-1.0410 


0.0620 


11 


-0.7931 


0.0133 


-0.7820 


-0.8390 


0.0570 


12 


-0.5997 


0.0117 


-0.5910 


-0.6400 


0.0490 


13 


-0.4093 


0.0099 


-0.4020 


-0.4430 . 


0.0410 


14 


-0.2203 


0.0081' 


-0.2140 


-0.2470 


0.0330 


15 


-0.0312 


0.0062 


-0.0250 


, -0.0500 


0.0250 


16 


0.1589 


0.0044 


0.1650 


0.1480 


0-.0170 


17 


0.5513 


0.0035 


0.3580 


0.3450 


0.0130 


L-8 


0.5473 


0.0040 


0.5550 


0.5400 


0.0150 


19 


0.-7481 


0.0063 


0.7660 


0.7400 


0.0260 


20 


0.9552 


0.0091 


0.9840 


0.9470 


0.0370 


21 


1.1704 


0.0120 


1.2100 


1.1600 


0.0500 


22 


1.3959 


0.0150 


1.4460 


1.3830 


0.0630 


23 


1.6355 


0.0182 


1.6970 


1.6200 


0.0770 


24 


1.8931 


0.0211 


1.9650 


1.8750 


0.0900 


, 25 


2.1767 


0.0244 


2.2600 


2.1560 


0.1040 


26 


2.4987 


0.0272 


2.5920 


2.4760 


0.1160 


, 27 


2.8828 


0.0304 


2.9870 


2.8580 


0.1290 


28 


3.3823 


0.0331 


3.4960 


3.3560 


0.1400 


29 


4.1669 


0.0358 


4.2900 


4.1400 


0.1500 
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TABLE 2.3.9 C 
Average Stability Indexes for STEP Il-Vocabulary for 
Calibration Situations Differing as a Function of Sample Size 



I.tem Easiness 
Ability 



500 
.1206 

.0229 



Sample Size 
1000 2000 



.0890 
.01 55 



.0623 



.0109 



4000 
.0548 

.0158 



The consequence of the interplay--,^| these factot'S/ is the observed 
difference between the stability -of "the easiness and' ability estimates. 

Even so, the amount of instability in the ability estimates "is not 
large, even for the case of N = 500. *Some perspective on this can be " 
gained by examining Table 2.3.5 which stiows the ability estimate results 
for samples of size 50a. Notice the maximums and mini mums for adjacent " 
score groups. If these values were upper and lower bounds for "confidence 
intervals" they would not overlap, except in one or t-^b cases^'. ' . 

To elaborate on the issue of the amount of instability, another 
comparison of interest is that involving the standard' error of the ability 
estimate (error of measurement) associated with theiSibility parameters 
(i. e., from the item analyses) and these stability indexes. The standard 
error for the ability estimate corresponding to a raw score of 19 is • 
about .45 regardless of the particular sample size analysis, yet the 
standard deviation of that estimate (score group 19, Table 2.3.5) is 
.013 for samples of 500 • In fact the range of estimates is only .05, 
about 1/9 the size of the measurement error, 

A 

2.4 Stability over Occurrence in the Desidh'^^' 

^ : — 

Part. of the analyses requiremenfet(iecessary for this project was 
to conduct separate item analysis of '€!^,<5li test (vocabulary, comprehension, 
total reading) whenever \x occurred in the design. As Table 1.3.2 
shows there were 136 cells' in the design matrix. Since each cell 
entry represents a pair of tests, one of which was administered first 

s 

(i.e., according to the row index DSN number), it is possible to summarize* 
the estimates of each item and ability parameter over their various 
occurrences in the design. 

40 



For example CAT, Lev,el 3, Form A was cadministered in 20 cells, 
10 times as a first test and 10 times as a second test* Considering 
the vocabulary subtest which has 40 items, the result is 20 estimates of 
the' 40 item parameters (and ability parameters) with each estimate computed 
on a diffe^rent sample and -administered in combination with a different 
test. Thus, it is possible to study the stability of parameter 
estimates over replications in the design with some deviation in sample 
size, number of occurrences, test pair combinations' and order of 
administration. 

* Appendix A contains Tcibles for each of the 14 primary form tests 
for vocabulary as "first tests" and "secondary tests" separately. A 
separate Table isVariso presented for item and ability estimates. The 
Tables contain means and, standard deviations over the "replications" 
(occurrences). 

Hprp.nikf* thp di<;rM<;sion of sample size, the standard deviation 
listed .in the Tables provide information on stability. These "stability 
index" values are summarized in Table 2.4,1 and 2.4,2 for item easiness 
and ability respectively. According to the rationale presented in 
Section 2.1 on the Importarice of stability of parameter estimates, it 
follows that the data presented here is probably the best measure of 
the model -data fit consequences with which we are concerned. 

' It is interesting to observe that the value of the stability indexes 
for ability can be approximated by dividing the item stability index ^ 
by the square root of the number of items in that test. This is rather 
intriguing since it implies that the stabili ty of ability estimates can 
be increased by increasing the numbed of items (i.e,, items like the ones 
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TABLE 2.4.1 



\ 



Stability Indexes for Item Parameter Estimates for the 14 Primary 
Forms whea the Tests were Administered First and Second 



Test 
Name 



Vocabulary 
First Test Second Test 



Comprehension 
First Test Second Test 



CAT 3-A 


.1190 


.1296 


CAT 4-A 


.0946 


.1060 


CTKS 2-Q 


.1304 


.1408 


CT3S 3-Q 


.0983 


.1233 


ITBS 10-5 


.li87 


.1110 


ITBS 11-5 


.lie? 


/ .1038 


ITBS 12-5 


. 1065 


/ .1083 


MAT E-F 


. 1475 


.1587 


MAT I-F 


.1369 


.1402 


STEP 4-a' 


.1401 


.1375 


SRA BL-E 


.lOSO 


.1252 


SIIA GR-E 


.1165 


.1224 


SAT I-W 


.1195 


.1-265 


SAT I I-W 


.1440 


.1430 



.1055 
.0907 
.1127 
.0852 
.1071 
.0903 
.0887 
.1134 
.1086 
.1297 
.1066 
.0940 
.1072 
.1081. 



.1199 
.0954 
.1070 
.0962 
.1006 
.0905 
.0956 
.1128 
.1147 
.1084 
.1108 
.0802 
.0948 
.1048 



\ 

\ 
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TABLE 2.4.2. 

SL^ibility Indexes for Ability Parameter Estimates for the 14 Primary 
Forms when the Tests were Administered First and Second 

-v. 



^^^^ Vocabulary Comprehension 

Name First Test Second Test ' First Test Second Test 



CAT 3-A 


.0218 


.0191 


.0278 


.0347 


CAT 4-A 


.0227 


.0163 


.0127 


.0088 


CTBS 2-Q 


. .0246 


.0270 


.0188 


- ..0138 


CTBS .3-Q 


.0105 


.0216 , 


.0086 


.0065 


ITBS 10-5 


.0152 


.0145 


.0175 


.0163 


ITBS 11-5 


.0131 


.0084. 


.0134 


.0105 


ITBS 12-5 


.0144 


.0165 


.0060 


•.0094- 


MAT E-F 


.■.0374 


.0442. 


.0275 


.0240 


MAT 1-F 


.0283 


.0235 


.0221 


.0209 


STEP 4-A 


.0389 


.0308 


.0105 "-^ 


.0112 


STEP BL-E 


.019S 


..-.0215 


.0248 


.0229 


SRi\ GR-E 


.0174 


.0255 


.0100 


- .0061 


SAT I-W 


.0204 


.0256 


■ .0229 


.0179 


SAT I I-W 


.0368 


.0302 


.0203- 


.0181 
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already included). Such a relationship is'well known in test 
theory, and is comfortable when it is once again observed. > 

Another observation from the data in- Table 2.4.2 is that the tests 
as a group are rather honiegeneous v/ith respect to these indexes. For 
the ability estimates, of which we are-^ost directly concerned, the 
stability indexes for a test average between .91^ and .04 which we can 
compare with the values obtained in the previous section. 

2.5 Stability as a Function of Sample Composition 

The two previous sections of this report have provided some indi- 
cation of the degree of stability of the Rasch Model parameter estimates 
over samples that can be considered more or less "random", or a least 
nonsystematic. As such they provide information on the extent to which 
these estimates can be expected to vary under "unselected sampling" 
conditions. The results of those analyses showed that the parameter 
estimates have a high degree of stability and this is especially true 
for ability estimates on which equating depends. This section describes 
t;hose anlayses which deal with stability of parameter estimates for 
selected subgroups of individuals identifiable by certain characteristics. 

In addition to item responses, the individual student* s data record 
contained codes'' for him on several demographic type variables. Among 
these were sex, race, IQ, grade, size of school s^^'tem, and the schools' 
estimated percentage of students on welfare. Table 2.5.1 shows these 
variables, their codes and descriptions. This analysis, then, was 
concerned with studying the stability of parameter estimates when the 
samples used were homogeneous with respect tc selected categories of 



TABLE 2.5.1 

Sample Composicion Variables Codes and Descriptions 
as Contained on the Anchor Test Study Data Tapes 



Name 



Code 



Description 



Sex 



Race 



IQ 



Grade 



% Welfare** 



1 

2 . 

„ Blank 

1 
2 
3 
4 
5 

Blank 



Boys 
Girls 

Not indicated 

Indian 
Negro 
Oriental 
Spanish surnamed 
\^ite and others 

Not indicated 



1 


< 75 


2' 


75-89 


3 


90-110 


4 


111-125 


5 


J >125 


6> 


Not tested 




Fourth 


5 


Fifth 


6 


Sixth 




< 50 


8 


50-99 


.9. . 


100rl9,9 . 


22 


■260--499 


35 


500^1199 ■ 


70 


. ^•1200 


1 . 


None ' 


2 • 


1-10% 


3 ■ 


11-25% 


4 - 


26-50% 


5 • 


51-75% 


6 ' 


76-90% 




* Measure of school system size 
4, 5, and 6. 



** Estimate of percentage of 
welfare. 



family 

4, 



in terms of enrollment in gracj^ 
income provided by^^blic 




TABLE 2.5.2 
Description of Smple Composition Analysis 
Groups for STEP Vocabulary 



Variable Groups Codes Used Sample Size 



Sex 


Boys 


1 


16607 




bins 


O 

L 




Race 


Black 


2 


4759 




Spanish 


k 


1540 


• 


l-Zhite 


5 


26245 


IQ 


Low 


1,2 


4254 




Middle 


3 . 


10864 




High 


4,5 


8216 


Grade 


k 


4 


10865 




5 


5 


11182 




6 


6 


11076 


Size 


■ Small 


4,8,9 


3432 




Medium 


22 


20736 




Large 


35,70 


8955' 


% of P.W. 


0% 


' 1 


3672 




1-10% 


2 


18507 




11-25% 


3 


5792 




20-50% 


4 


4155 




51-90% 


5,6 


997 
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the above variables. 

, ;_The study of sample composition was divided into two parts: (1) an 
examination.pf all six sample composition variables for one test, (2) an 
examination of all primary form vocabulary tests on race and IQ. For 
the first part STEP vocabulary was chosen because of its large sample 
size (as mentioned in section 2.3). In each analysis, for each subgroup", 
ability parameter estimates were obtained and plotted. Stability indexes 
were also computed for the data corresponding to each of the pilots. 

Figures 2.5.1 through 2.5.6 (in Appendix B) show the results 
of these analyses for STEP vocabulary on each of the six sample composi- 
tion variables. Except perhaps for IQ the results of the other ftve 
variables show practically identical ability parameter estimates. 

Part two of this anlaysis involved examining each of the primary 
form vocabulary tests^ on Race and IQ. Table 2.5.3 contains the sample 
sizes used for each test and qomposition group. Following the analyses 
to estimate, the parameters for each calibration conditions, the ability 
parameter estimates for each sample composition nroup on each test were 
plotted to display the degree of invariance in the parameter estimates. 
These results are shown as Figures 2.5.7 through 2.5.32 in Appendix B 

In addition to the plots the information on stability was summarized 
in the form of stability indexes for each of the tests. Those data are 
shown in Table 2.5.4, and they indicate that there is more instability 

across IQ groups than across race groups. As a matter fact, whenever ! 

\ 

studies like this are conducted, where stability is observed across 
samples differing in composition, the variable most closely related to 
the latent trait being measured by the test will show the greatest 
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TABLE 2.5.3 
Sample Sl^es on each Vocabulary Test for 
Sample Composition Analyses by Race and IQ 



Test 


- 


Race 






12. 




Name 


B 


S 


W 


L 


M 


H 


CAT 3-A 


3226 


1131 


18A06 


3556 


8755 


5499 


CAT A -A 


1645 


5A9 


9209 


1655 


4429 


3181 


CTBS 2-Q 


A485 


963 


16A06 


3300 


7895 


5168 


CTBS 3-Q 


2281 


4A3 


8129 


1654 


4021 


2855 


ITBS 10-5 


1443 


635 


8228 


1209 


3529 " 


2491 


ITBS 11-5 


1513 


660 


8570 


1385 


3889 


2880 


ITBS 12-5 


166A 


593' 


8700 


1344 


4134 


3020 


MAT E-F 


195A 


500 


8333 


1814 


4011 


2309 


MAT I-F 


364A 


985 


17287 


3445 


8590 


5258 


STEP A -A 


kill 


1537 


25755 


4252 


10831 


2845 


STEP BL-E 


A524 


1321 


1525A 


3427 


6825 


4224 


SRA GR-E 


2676 


68A . 


8561 


2154 


3989 


2682 


SAT I-W 


1767 


902 


75AO 


1426 


3450 


1914 


SAT I I-W 


3781 


1881 


15219 


33^9 


7261 


• 4943 
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TABLE 2.5.4 
Ability Parameter Stability Indexes For 14 
Tests Over Samples Differing In Compositions 





Test Name 


Stability Index 






Race 


m 




CAT 3-A 


.0577 


.0983 


- 


CAT 4 -A 


.0943 


.1596 




CT8S 2-Q 


.0596 


.1097 




CTBS 3-Q 


.0562 


.1111 


'A- 


i 1 DO ! U"-3 


.0866 


.1058 


ITBS 11-5' 


.0790 


.0940 


o 


ITBS 12-5 


.0947 


,.1299 


\ 


MAT E-F 


.1035 


.1465 




MAT I-F 


.1110 


.1440 




STEP 4-A 


.1448 


.1806 




SRA BL-E 


.0951 


.1289 




SRA GR-E 


.1100 


.15B5 




SAT I-W 


.1397 


.1947 




SAT I I-W 


.1216 


.1611 
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instability, as long as the test contains iterils with .less than perfect 
model -data fit. ' 

2>6 Describing Test Fit According to Certain Antecedent Conditions 

In the beginning of this chapter two general approaches to describing 
test fit were mentioned. One was in terms of the proportion of fitting 
items contained in the test and the other was in terms, of the achieve- 
ment of those consequence conditions that the model predicts* One 
such consequent condition is stability, or invariance, of parameter 
estimates v/hich v/e have discussed in the previous sections. In this 
section we will present various indicators of test firt in terms .of 
iter] fit, and discuss their relationship with stability. 

The item analysis program, MESAMAX, provides two statistics for the 
items that are useful in dealing with the issue of item fit* One is an 
index of item discrimination and is called "slope*" This is a least 
squares estimate of the slope of item characteristic curve, after a , 
linearizing transformation. It is based on fitting the line of regression 
of "percentage correct", on an ability estimate corresponding to each 
possible I aw score. Both the raw scores and percentage correct are 
transformed by a log odds transformation to linearize the item chciracter- 
istic curve. Thus, slope is the regression of "item log odds" on 
"test log odds." Theoretically, slope values should be near unity for 
fitting items. 

Since slopes should be unity for fitting items, deviant values for 
items are an Indicator of misfit.. It seems reasonable then, that 
treasures of dispersion of the distribution of item slopes for a test 
could be used as measures of test fit. In addition, it seems that the 

50 



shajDe of the distributions of slopes may be an important indica-tor-^f 

•^fit. Yet another way to approach a slope index of fit is to specify a 

5 

criterion for fit, such as the interval 1.0 + .20 , and determine the 
relative number of items that meet that criterion.' Table 2.6.1 
presents various slope measures of fit for the 14 primary form vocabu- 
lary tests. as well as the distributions of slope values. 

Mean Square Fit is the MESAMAX index of fit of the item to the model 
This index is a function of sample size; therefore, its interpretation 
rust be nade with care. For a particular test, the values of mean ^ 
square fit might well indic'ate the relative fit of the various items 
within a test; however, mean square values" probably should not be 
used to compare items from one test to the next nor should one attempt 
to interpret their absolute magnitudes. 

The problem of interpreting the mean squares is a general problem 
in statistical hypothe^^is t^^itinn. Thp rolp of large samples in 
rejecting null hypotheses is well known. For any difference between 
data and an hypothesis, most statistical tests will lead to the rejection 
of the null hypothesis if the sample is largb enough. The Rasch 
project samples are adequate for rejecting almost any difference as a 
significant departure from the model, even when the difference is of no 
practical consequence. For example, the mean square fits are basecj on 
the difference between expected and obtained proportions for each item- 
by-score group cell entry. The model specifies an expected proposition 
correct, p^; while we obtain p^. WUh a fairly large sample size, N we 



" Such an interval is consistent with previous studies that have dealt 
with the amount of slope deviation the model will tolerate. See 
Panchapakesan (1969), C. Rentz (1975). 
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TABLE 2. 6.1 
Slope Indexes of Fit and Frequency 
Distributions' of Slope Values for the 
Primary Form Vocabulary Tests 



Test Name 



Slope Index 

_2_ 



%i.2 



FREQUENCY DISTRIBUTION 



Number 



,7 .8 .9 1,0 1.1 1.2 1.3 1.4 >1.4 Of Items 



CAT 3-A 


.153 


65.0 


" 2 


1 


4 


4 


8 


8 


6 


4 


0 


3 


40 


CAT 4-A 


.227 


47.5 


4 


1 


3 


7 


1 


4 


7 


4 


4 


5 


40 


CTBS 2-Q 


.179 


52.5 


2 


2 


4 


3 


7 


10 


1 


4 


2 


5 


40 


CTBS 3-Q 


.177 


50.0 


1 


■ 4 


3 


3 


7 


8 


2 


5 


' 3 


4 


40 


ITBS 10-5 


.116 


57.8 


0 


4 


3 


2 


7 


9 


4 


4 


3 


2 


38 


ITBS H-5 


.145 


62.8 


1 


2 


•3 


6 


6 


8 


7 


6 


1 


3 


43 


ITBS 12-5 


.214 


56.5 


3 


1 


3 


9 


6 


7 


4 


4 


4 


5 


46 


M\T E-F 


.146 


56.0 


4 


2 


4 


3 


. 6 


12 


7 


6 


4 


2 


50 


MAT I-F 


.161 


56.0 


0 


5 


6 


9 


6 


6 


7 


5 


4 


2 


50 


STEP 4 -A 


. .::io 


46.6 


4 


0 


4 


1 


6 


6 


1 


5 


1 


2 


30 


SRA BL-C 


.173 


57.1 


2 


2 


1 


4 


8 


6 


6 


6 


1 


6 


42 


SRx\ GR-E 


.228 


47.6 


1 


3 


6 


5 


3 


8 


4 


3 


4 


5 


42 


SAT 1-W 


.167 


63.2 


1 


0 


4 


8 


4 


11 


1 


4 


3 


2 


38 


SAT II-W 


.186 


52.1 


2 


4 


3 


4 


5 


9 


7 




5 - 


• 5 


48 



1 Q=the semi-interquartile range 

2=the percentage of items in the inter\'al .80-1.20 

2 Column heading represents upper limit of interval 
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probably estimate p^* - p^ fairly accurately. The test is 

. ' • ^2 ^ N (P„ - Pe)^ 

Pe (l-Pe' 

These are summed over score groups and. averaged to yield the mean 
squares for items. Now, what if - p^ is estimated with great accuracy 

for a sample of size N, but we use an even larger sample— of size 10 N? 
2 

Then clearly, z will increase ten-fold. 

As an index of test fit some appropriate average of the item mean 
squares could be used if some adjustments are made for sample size and 
then only if information about relative differences between tests is 
desired (it is doubtfuPthat any ^adjustment in these indexes could, 
make their magnitudes meaningful). Table 2.6.2 shows four such indexes 
based on item mean square fit values. The mean and median for each 
test is shown along with adjusted values in order to eliminate effects 
of differences in sample size. The factor (10,500/N) was used to adjust 
each mean and median (10,500 being the .'.mallest sample size for the tests 
considered) . 

It is likely that as an index of test fit the item mean squares 
would be a defensible choice since any factor that might cause misfit 
would be reflected in the mean squares. The principle disadvantage of 
the' mean squares is an interpretation of their magnitude. Cartledge 
(1974) used the same sample size adjustments that were used here (10,500/ 
N) on data simulated to reflect different levels of model fit. Fit was 
controlled by manipulating the range of item discrimination parameters 



TAB.LE 2.6.2 
Average Item Mean Square Fit Values for-' 
Che lA Primary Form Vocabulary Tests 



Test Name 

CAT 3-A 
CAT 4-A 

CTBS 2-Q , 
CTBS 3-Q 
ITBS iO-5 
ITBS 11-5 
ITBS 12-5 
MAT E-F 
MAT I-F 
STKP 4-A 
SRi\ BL-E 
SR.\ GR-E 
SAT I-K 
'SAT riMv' 



Mean 

15.3 
11.9 
18.3 
9.7 
7.0 
7.0' 
8.1 
8.8 
11.6 
30.0 
15.5 
8.2 
7.4 
12.9 



Me d ian 

9.1 
' 8.9 
1A.5 
5.5 
5.0 
3.9 
4.4 
5.2 
9.8 
16.8 
10.0 
0.3 
5.8 
9.2 



Mean* 

6.9 
10.9 
8.7 
, -9.2 
7.0 
6.7 
7.7 
8.2 
5.4 
9.7. 
7.5 
7.1 
7.4 
6.2 



Median* 

4.1 
- 8.1 
6.9 
5.2 
5.0 
3.8 
4.2 
4.9 
4.6 
5.4 
4.8 
5.4 
5.8 
4.4 



*Adjusted by factor (10,500/N) 
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TABLE 2. 6. 3 
Adjusted Median Mean Square Fit Values 
for Al] Tests in the, Data Base 



45 



Test Name 



Vocabulary 



Comprehension 



Total 



CAT 3-A 
CAT 3-B 
CAT 4-A 
CAT 4-B 

CTBS '2-Q 
CTBS 2 R 
CTBS 3-Q 
CTBS 3-R 

ITBS 10-5 
ITBS 10-6 
ITBS 11-5 
ITB5 11-6 
ITBS 12-5 
ITBS 

MAT E-F' 
^UT E-G 
>l\T I-F 
M\T I-G ■ 

STEP 4-A 
STEP A-B 

SR.-\ BI-E 
SR.\ BL-F ' 
SRA GR-E 
SRA GR-F 

SAT I-W 

SAT I-X 

SAT [I-W 

SAT II-X 



4.102 
6.692 
8.061 
9.545 

6.863 
8.467 
5.177 
11.084 

4.984 
8.923 
3.764 
7.431 
4.163 

4.943 
10.128 
4.599 
7.092 

5.4/3 

11. m 6 



6.695 
5.443 
10.529 

5.795 
11.319 
4.433 
8.177 



5.623 
6.163 
4.692 
10.298 

3.922 
9.166 
3.973 
10.893 

4.670 
7.322 

2^365 
7^.'67"3 

6.176 

8.527 
13.115 
6.477 
5.884 

3.596 
5.402 

4.615 
8.411 
3.444 
9.698 

4.609 
11.341 
2.815 
6.319 



2.472 
5.075-v 
5.075 
3.882 

2.933 
5.916 
3.481 
8.106 

2.768 
6.340 
2.328 
6.290 
2.244 
5 . 665 

3.897 
8.749 
3.455 
5.069 

3.1Q8 
6.517 

3.911 
5.590 
2 . 678 
8.718 

3.134 
9.517 
2.617 
5.233 
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in the two parameter logistic model from "zero variation", and "high 
variation". The zero variation condition would provide a high degree 
of fit to the model while the high variation condition would have a low 
degree of fit. Her findings, of importance here, relate to the size of 
the average item mean squares (adjusted) since they provide us with some 
guidelines for interpreting the values obtained in our own work. Cartledg 
found average mean squares of about 15.0 over several replications of 
her nonfit condition and for the fit conditions the average was about 
2.0. Our own values range from 5.4 to 10.9 and except for the one test 
whose value is 10.9, the indexes are rather homogeneous among the Vocabu- 
lary Tests. These comparisons show that these tests do not differ among 
themselves yet they are neither very good nor very bad fitting tests. 

A comparison of the values in Table 2.6.2 indicate that the two 
indexes, mean and median, do not rank the tests in exactly the same 
order; the rank order correlation between them is 0.684. This relatively 
low- relationship is probably due to the mean's sensitivity to peculiar- 
ities in the distribution, for exaiiiple, two or three extreme values. 
As an index of test fit the median is probably mc^a desirable, at least 
for the purposes of comparing tests. Table 2.6.3 shows the median item 
man square fit index for each of the tests ased in this sutdy, for 
vocabulary, comprehension and total reading. 

2.7 -'Relationship of Test Fit Indexes 

A variety of indexes of test fit have been presented in this 
chapter. Some have dealt with the extent to which the items in the tests 
con^'orTP to the antecedent conditions necessary for model data fit, 
others have dealt with the tests' achievement of specified consequent 
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conditions. In this section we have summarized some of the more 
important indexes of fit by presenting thew- together in Table 2.7.1 
for the 14 vocabulary tests, with an indication of whether they apply 
to antecedent or consequent conditions. 

' A new index is also presented, in this Table. It is an index of 
first factor concentration,, and is labelled "% 1st Factor^ The index 
was derived from a principle components afblysis of th| item intercorre- 
lation matrices of the 14 vocabulary tests. ; The index^^ the percentage 
of variance accounted for by the first comporrent. 

In suirnary, the^results of the studies reported iii. tjiis chapter 
are not unequivocal . The 14 vocabulary tests, to which attention was 
concentrated showed rather moderate test fit in terms of item statistics 
yet acceptable." performance in the studies of stability. Indeed, these 
tests display a high degree of homogeninity with respect to the various 
indexes of fit. 

We had hoped that these tests would be sufficiently different so 
that by studying how the various indexes of model -data fit varied over 
IhTs'collectic^n of tests, we could learn more about how model-data fit 
could be o\ \ '^u. j\. ^rn lack o+ ;,oticQ!)lj strong variance between these 
tests was dU-ippointinq with respect to the theoretjical issues of how to 
evaluate model -data fit; yet, for the practical task of equating, ihe 
results were encouraging. Our conclusions about model -data, fit are 
cautiously optimistic and 'hey will be preservtedHn Chapter 5. 
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Chapter 3 
Equating Methodology 



The purpose of this section is to present techniques and results for . 
equating and estimating equating errors. We wijl present our general logic 
and our specific techniques, as well as some sarliple data; however, first 
we should consider general principles in Rasch lequating. 



3>1 General Principles 

There are two basic references "^or consideration, in regard to 
fundamental logic. Angoff , in Thorndike*s Educational Measurement , 
presents the various experimental designs that one might employ to 
equate two tests. He also presents details of equi percentile and linear 
equating. The general logic and procedures of Rasch equating is in the 
dissertation of Nargis Panchapakesan (1969), 

We define "equivalent scores" as scores that correspond to the same 
Rasch ability. Our definition is similar to the usual definition of 
equivalent scores "Two scores ...may be considered equivalent if 
their corresponding percentile ranks in any given group are equaV, 
(Angoff, W. H. In Thorndike, R. L., Educational Measurement , p. 563). 

In order to apply this definition, ability scales for tests to be 
equated must be comparable. Thus, prior to actual raw score-to-raw 
score equating, each test must be calibif^ated on the same scale. 

A distinction between "calibration" and "equating" is helpful for 
clarification. Angoff (1971, p. 565) has pointed out the need for this 
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distinction. ''Calibration" refers to the assignment of abilities .to 
raw scores, whereas "equating" refers to the determination of equivalent 
raw scores. 

The next section outlines the procedures for determining equating 
constants. These constants are used to calibrate all tests on a common 
scale. After this calibration is completed, then raw score-to-raw score 
equating is accomplished by applying the definition stated previously. 

There are two general equating procedures developed to conform to 
Rasch theory. Both yield a single additive constant for a test pair. 
This constant adjusts the ability scale on one test to that of another. 

We refer to one as the "'ability method"-. In this technique, the 
test pair is administered to a single group of persons. Each test in the 
pair is analyzed separately by a Rasch analysis. Since the group's ability 
is not different on the J:wo tests, any perceived difference in average 
ability must be due to the differences in scale origin on the two tests. 
The appropriate equating constant is the difference in the ability 
averages. One of the pair is chosen as a reference test. The constant 
^ is applied to the second test to adjust its "^coring table to conform 
to that of the reference test. 

The second equating procedure we call the "^difficulty method". 

The difficulty method is based on a Rasch analysis of both tests com- 

\ ^ 

binqd fnto one long test. If the two tests in the pair have the same 

h 

scale origin, then the averages of log easiness for both tests will be 

\ 

equal. To the degree that these averages are not equal, the two tests 
separately have scale origin differences. The equating constant is the 
difference \in average log easiness. This constant is added to the , 
abilities gUerated by one of the tests to put this test on the same 
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scale as the other test. 

The equating methodology to be described in the following section 
•is not dependent on whether the "ability" or "difficulty" method is 
chosen. In either choice,. the differences obtained are the basic data 
for following the procedures for obtaining the final recommended" equating 
constants. , ^ 

Equating constants based on both methods were determined- However, 
the results of the difficulty method are stressed and recommended. This 
choice is largely based on the assumption that these results are more 
stable. The expected stabilization occurs through the analysis of the 
combined test data in the difficulty method. The combined data analysis 
should yield results based on a commonly measured trait. On the other 
hand, the ability method requires separate analyses of each te:;t. These 
separate analyses lead to a need to depend entirely on an assumption of 
equivalence. However, both methods theoretically lead to the same 
results, so the choice is somewhat arbitrary. Standard errors of equating 
constants are also expected to be similar. This similarity is discussed 
more fully 'ater in thi^ chapter. 

STEP vocabulary was chosen as the r^f^erence test for all equatings. 
STEP was chosen because it has a single level which was given at all 
grades, and thus was paired with all other te'sts studied. Tables are 
presented in this report in which each test is equated into all other 
tests, but equating was conducted using STEP to define the origin of 
the calibrated scales of all tests. The choice of a reference test is 
arbitrary as only additive constants are involved. 
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Z.2 Estimating Equatings Constants : Methodology and Results 

In order to avoid irregularities due to sampling in the individual 
sets of data, constants are based on averages over several se^s of data. 
These calculations will be explained in detail. Throughout this section, 
the difficulty method is employed, so no distinction is made between 
ability or .difficulty adjustments. | 

Data were analyzed by level and form without regard b grade level. 
However, equating constants were developed by grade level. For example, 
if a particular test was designed for both the fourth and fifth grades, 
then we did not separate the data for this test by grade. Data from 
both grades entered into the calculations of fourth grade equating 
constants and fifth grade equa'ting constants. After determining both 
constants, averages over the grades were calculated, where appropriate. 

For a particular test pair, data on both tests were calibrated 
treating the tests as one long test. The combined analysis yields 
easiness estimates for each item in both tests. The average of the 
easiness was detennined for each of the two tests in these combined-test 
analyses. A difference in these averages was calculated for each test 
pair*. ' 

^ The avciaqe differences for primary forms equatings were organized 
into 7x7 matrices. There were six of these 7 x 7 matrices: namely, 
vocabulary tests at each r three grade levels and comprehension tests 
at each of three grade levels. 

The 7 X 7 difference matrices included zeros on the diagonals, as 
parallel forms adjustments were determined separately frcim test-to-test 
adjustments. The seven roy^s and columns each correspond to one of the 
test batteries. The matrix entries above the diagonals are- the log 
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easiness average differences between the test pair when the pair was 
administered in one of the orders of testing. The entries below the 
diagonals were based on the data Obtained in the other order of testing. 
These differences matrices are presented in Tables 3.2.1 to 3.2.6. Each 
difference in each table is calculated by subtracting the average 
easiness of the test identified by the column heading from the average 
easiness of the te5t identified by the row heading. 

Our final recommended results followed the prpcedure to be discussed 
only for equating vocabulary tests. The 7x7 matrices were used to 
equate comprehension tests, bu"- results were not our recommended 
ones. The reasons for this change in procedure will be given subsequently. 
The 7 X 7 matrices for comprehension are presented here partly because 
they do yield valid equating constants, but primarily because they provide 
basic data for readers who wish to compare the difficulty of various 
reading comprehension tests. ^ , • 

The analysis proceeded by calculating row and column means. The 
zero diagonal values are included in the calculation of these means as 
these correspond to the difference between a test and itself. (Differ- 
ences between the column mean vector and the row mean vector are due 
to order-of -testing effects and sample differences. 

The remaining calculations can be illustrated by example. Consider 
the first table— fourth grade vocabulary. The marginal means are 
presented below. 

Test 1 2 3 4 5 6 7 ^ 

Row Means -.219 .093 -.172 .892 .142 ' -.143 -.444 
Column .222 -.026 .239 -.853 -.042 .160 .449 
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TABLE 3.2.1: FOURTH GRADE VOCABULARY DIFFERENCE MATRIX 





TEST 


1 


2 


3 


4 


5 


6 


7 




CAT 


CTBS 


I TBS 


MAT 


STEP 


SRA 


SAT 


MEAN 



.1 


0. 


-.258 


.004 


-1.171 


<-.246 


-.053 


.190 


-.219 


O 

c 




u • 




• " • I OD 


• u/u 




57*2 




3 


.031 


-.299 


0. 


-1.022 


-.256 


.087 


.256 


-.172 


4 


1 .155 


.863 


1.081 


0. 


.928 


.924 


1 .290 


.892 


^5 


.313 


.138 


• .443 


-.769 


0. 


.293 


.576 


.142 


6 


.011 


-.188 


.063 


-.929' 


-.211 


0. 


.256 


-.143 


7 


-.250 


-.438 


-.204 


-1.294 


-.578 


-.346 


0. 


-.444 


MEAfi 


.222 


-.026 


.239 


-.853 


-.042 


.160 


.449 


.021 



TABLE 3.2.2: FIFTH GRADE VOCABULARY DIFFERENCE MATRIX 



1 2 3 - 4 5 . b / 

■f-ST CAT CTBS ITBS MAT STEP SRA SAT 



1 


c. 




258 


.712 


.419 


-.246 


-.053 


1.212 


.255 


2 


.? ?3 


• 0. 




.809 


.633 


.070 


.215 


1.652 


.525 


3 ■ 


-.o'i£ 




707 


0. 


-.114 


-.737 


-.499 


\758 


-.272 


4 


-..'150 




502 


.217 


0. 


-.482 


-.347 


.849 


-.102 


5 


.3-3 




.138 


.917 


,^3 


0. 


.293 


1.372 


.529 


6 


.Oil 




.188 


.558 


.437 


- . 211 


0. 


1.222 


.263 


7 


-1.439 


-1 


.592 


-.922 


-.851 


-1 .438 


-1 .266 


0. 


-1.072 




-.263 




.444 


-.329 


.171 


-.435 


-.237 


1.009 


.018 
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- TABLE 3,2.3: SIXTH GRADE VOCABULARY DIFFERENCE MATRIX 



55 



TEST 


1 

CAT 


2 

CTBS 


3 

I TBS 


4 

MAT 


5 

STEP 


6 

SRA 


7 

SAT 


MEAN 


1 


0. 


.00-7 


'-.029 


-.867 


-1 .522 


-.486 


-.021 


-.417 


2 


-.025 


0. 


-.104 


-.896 


-1 .284 


-.421 


-.021 


-.390 


3 


.042 


.115 


,0. • 


-.763 


-1 .361 


-.338 


.056 


-.321 


4 


.956 


.900 


.854 


0. 


-.482 


.356 


.849 


.49r 


5 


1 .5^.4 


1.611 


1.528 


.673 


0. 


1.082 


1.372 


1.116 


6 


.506 


.482 


.368 


-.413 


-.958 


0. 


.466 


.064 


7 


-.015 


.073 


-.048 


-.851 


-1 .438 


-.429 


0. 


-.387 


MEAN 


.430 


.455 


.367 


-.445 


„-l .006 


-.034 


.389 


.022 



TABLE 3.2.4: FOURTH GRADE COMPREHENSION DIFFERENCE MATRIX' ' 



TEST 


1 

CAT 


2 

CTBS 


3 

I TBS 


4 

MAT 


5 

STEP 


6 

SRA 


7 

SAT 


MEAN 


1 


0. 


-.063 


.276 


-.402 


.364 


.312 


.320 


.115 


2 - 


.?62 


0. 


.399 


-.269 


.478 


.446 


.435 


.250 


3 


-.145 


. -.351 


0. 


-.713 


.113 


.063 


.031 


-.143 


t 


.625 


.516 


.784 


0. 


.874 


.630 


.801' 


.604 


5 


-.201 


-.383 


.050 


-.751 


0. 


-.055 


.006 


-.202 


6 


-.127 


-.271 


.149 


-.533 


.127 


0. 


.074 


-.083 


7 


-.092 


-.298 


.116 


-.503 


.055 


.022 


0." 


-.100 


MEAN 


.035 


-.122 


.253 


-.453 


.287 


.203 


.238 


.063 
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TABLE 3.2.5: FIFTH GRADE COMPREHENSION DIFFERENCE MATRIX 



56 





TEST 


r 

CAT 


2 

CTBS 


3 

I TBS 


4 

MAT 


5 

STEP 


6 

SRA 


SAT 


MEAN 




1 


0. 


-.063 


.771 


.485 


.364 


.312 


.905 


o ciC. 

.oyt) 


2 


.262 -'O. 


.818 


.552 


.478 




J I .Ubo 


7 

.01/ 


o 


-.568 


-.72T 


n 


1Q7 


- 1 fid 


- 332 


' .25] 


-.247 


4 


-.244 


-.340 


.295 


0. 


.186 


-.047 


.529 


.054 


5 


-.281 


-.383 


.355 • 


.064 


0. 


-.055 


.508 


.030 


6 


-.127 


-.271 


.494 


.226 


.126 


0. 


.636 


.1-55 


7 


-.740 


-.873 


-.153 


-.354 


-.406 


-.480 


0. 


-.429 


MEAN 


-.243 


-.379 


.369 


..111 


.084 


-.022 


.556 


.068 




TABLE 3.2 


.6: SIXTH GRADE COMPREHENSION 


DIFFERENCE MATRIX 




rcsT 


1 

CAT 


2 

CTBS 


3 

I TBS 


4 

MAT 


5 

STEP 


6 
SRA 


7 

SAT 


.MEAN 


1 


0. 


-.180 


.184 


-.710 


-.665 


-.308 


-.249 


- , cIv) 


2 


.407 


0. 


.433 


-.397 


-.316 


.042 


-.048 


m 7 
. Ul / 


o 
0 


-.024 


-.313 


0. 


-.834 


-.714 


-.434 


-.441 


-.394 


4 


.393 


.585 


1 .020 


0. 


.186 


.462 


.529 


.525 


5 


.897 


.574 


.916 


.064 


0. 


.477 


.508 


.491 


6 


.494 


.060 


.531 


-.443 


-.308 


0. 


.095 


.061 


7 


. *61 


.198 


.581 


-.354 


-.406 


.152 


0. 


.090 


MEAN 


.447 


.132 


.524 


-.382 


-.317 


.056 


.056 


.074 
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The signs of corresponuing numbers differ since. they correspond to 
reverse orders of test administration, and, therefore, to reverse orders 
of subtraction. The -.219 and the +.222 both mean that the average 
easiness of test 1 is lesser than the overall average easiness. That 
is, test 1 is more difficult than the average. 

The row and column means are averaged to get an adjustment averaged 
over testing order. This averaging is accomplished by changing the 
sign of the row means prior to averaging. We get the following 
averages: 

T^st" 1 2 3 4 5 ' 6 7 

Averages .221 -.059 .206 -.872 -\ 092 ,151 .447*^- / 

All tests are put on the STEP scale by adding .092 to each average 
(test 5 is STEP). This correction puts all tests on one scale with an ' 
origin defined by the STEP origin. We obtain the following set of equating 
constants: 

Test 1 2 ' 3 4 5*6 7 

Constants .313 .033 .298 -.780 0 ,243 .539^ 

The process is repeated for grades five and six. The constants at 
this point are presented in Tables 3.2.7 and 3.2.8. The next adjustment was 
to average constants for tests that were administered at more than one 
grade level. These avef'aged coefficients appear betvieen the values used 
to compute the averages and are underlined. The ".313" means that test 
1 is harder than STEP by .313 log units. Thus, we adjust test 1 by 
adding .313 to each of its stability estimates to put each on the STEP 
scale. , , 

The next step is to obtain equating constants for parallel (secondary) 
forms to get these tests on the same scale as the primary forms. This 
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TABLE 3.2.7: A SET OF EQUATING CONSTANTS FOR VOCABULARY* 



GRADE ' TEST 



1 2 3 4 5 6 7 

CAT CTBS ITBS • MAT STEP SRA SAT 

.313. .033 .298 -.780. 0^ .243 .539 

.267 .016 ' , J38 

.221 -.002 . .782 .619 0 . .232 1.523 

.606 1.486 

1.484 1U89 1.405 .593 0 1.013 1.450 



* Underlined nx.--cers are th'e averages of the numbers immediately above 
and below them. Averages are calculated when a test is used at two 
grade levels. 



TABLE 3.2.8: A SET OF EQUAIUIG- CONSTANTS FOR COMPREHENSION* 



GRADE TEST 



12 3 4 5-6 7 

CAT CTBS ITBS MAT STEP SRA SAT 

-.285 -.431 -.046 -.773 0 -.101 -,075 

-_._3J_5 :_.i53 -.108 

.346 -.475 ;*-.'201 +.002 0 -.i;6 +.446 

-.024 +.427 

+.765 +.461 +.863 -.050 0 +.401 +.387 



* Underlined nuribers are the averages of the numbers immediately ebove 
and below them. Averages are calculated when a test is used at t^^/o 
grade levels. v 
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step consists of adding an appropriate log easiness mean difference to 
the primary form equating constant. Data were obtained for each secondary 
form in only two data sets- That is, secondary fonns were' paired with 
their respective primary form for two orders of testing. Thus, each 
secondary form equating constant consists of adding to the primary 
form equating constants the average of the mean easiness difference found 
in the two data sets. 

'^ne comprehens-ion test equating now needs to be explained. If we 
were to use the results of the six 7x7 tables for bringing comprehension 
onto the STEP \ocabulary scale, then it would be accomplished by adding 
to each coinprehe'^sion equating constant the value .799. This nur^ber is 
the difference in average easiness for STEP vocabulary and comprehension 
ite^Tis when these items a.^e caf^b^fcated together on a total test analysis. < 
It reflects the faft that coirprehension is more difficult than vocabulary 

^« CTCO ^ 

This procfc^dure was ret »^sed, however, in our final product. The 
result of the +.799 adjustnent would be to slide all comprehension tests 
the same arount without regard to their respective vocabulary test. It 
was cur opinion that battery-by-battery adjustments would be superior in 
that the re^lationship Letv.een vocabulary and comprehension would be 
maintained for each battery. Therefore, comprehension was adjusted to 
vocabulary for eacn te: t using the same procedure that was used to find 
[.arallel forrps equating constants. This had the added benefit of a](/l owing 
ajustments to be made on the basi^ of all data added together (our base 
fi\^es), rather than-^^n the ba.sis bf the three 7 x 7 matrices, since 
vock-iularv and conorehens ion pairls were always administered together • 

1 ' ■ /' 
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Our Final result is presented in Table 3.2.9. This table presents 
the equating constants that were recornmend ^ ^ 

3>3 Raw Score Equating 

The coefficients in Table 3.2.9 were entered into our equating program 

to yield raw score equating tables. (These are included in Appendix 

C & D). The logic of the equating procedure follows. Let B^. be the 

raw scores on the base test. Let A^^^. be the corresponding abilities 

in the base test scoring table after equating adjustments. Let C 

be the raw scores on the test to be equated to the base test, and let 

A . be the corresponding abilities in the scoring table of this test 

after equating adjustn.ents . Then for each possible !,core B^. on the / 

base\est, we find the score C on the test to be equated such that 

3 



is a minimum. The score, C^, that minimizes Aj^^. - A^^^ is the 
equivalent score . 

I 

3.4 Error Problems ^ | 

We recognize several sources of brror entering Into the application 



of equating tables. The major source 



of eri^or is test unreliability. 



This particular error source appears to be much more severe than other 
error sources an^ is usually on the order of .3 of a log unit. That is to 
say, the standard error of measurement for these tests is between 2.5 
and 3.5 raw score units which is usually about .3 log ability units. 
Moreover, conditional raw score standard deviations for equated test's . 



^0 



61 

TABLE 3.2.9: EQUATING CONSTANTS RECOMMENDED FOR USE* 





TEST 
DSN 


TEST 
NAME 


VOCABULARY 


COMPREHENSION 


TOTAL ' 



I 



1 


CAT3A 


+ .267 


+.437 


+ .354 


2- 


CAT3B 


-.181 


-.067 


-.123 


3, 


CAT4A 


+1.484 


+1.732 


+1 ..61 5 


4 


CAT4B 


+1.139 


+.814 


+J^67 


5 


CTBS2Q 


+.016 


+.262 


+.146 


6 


CTBS2R 


-.508 


-.598 


-.556' 


"7 
/ 


CTBs-^q 


+1.4R9 


+1.437 


+.906 


8 


CTBS3R 


+1:294 


+.985 


+1.1/30 


9 


ITBSlO-5 


+ .298 


+ .664 


+ .533 


10 


ITBSlO-6 


+.249 


+ .369 


-1/.326 


11 


ITBSn-5 


+ .782 


+1 . 1 25 


+i.ooo 


12 


ITBSll-o 


+ .530 


+ .499 


/+.510 


13 


ITES12-5 


+1.405 


+1.859 


/H.688 


14 


ITBS12-6 


+1.283 


+1;415 


/+1.365 


15 


MATEF 


-.780 


-.172 


-.492 


16 


MATEG 


-.892 


-.199 


-.564 

i 


17 


MAT!" 


+.606 


+.802 


+ .699 


18 


miG 


+ .580 


+.762 


+ .666 


19 


STEP4A . 


0 


+.799 


+ .399 




STEP4B 


-.152 


+ .510 




21 


SRABE 


■+.238 


+ .506 


1 +.381 
i -.150 


22 


SRABF 


-.137 


-.161 



TABLE 3.2.9: EQUATING CONSTANTS RECOMMENDED FOR USE* 

(continued ) 





TEST 
DSN 


TEST 

NAME ■ 


VOCABULARY 


COMPREHENSION 

/ 


TOTAL 



23 \ 


SRA 


+1.013 , 


/ +1.348 


+1.192 


?4 


SRAGF 


+.745 / 


+ .510 


+ .620 


25 


SATIW 


+.539 / 


+ .591 


+ .571 


26 


SAT IX 


+.439 / 


+ .495 


+ .473 


27 


SATIIW 


+1.486, 


+1 .618 


+1 .411 


28 


SATIIX 


+1.638 


+1 .663 


+1 .652 



.: * The equating constant for a test is to be added to all abilities 
estimated from that test to yield abilities on a scale the origin 
of .vhic'n is * rc ■)riGin ot STEP 4A Vocabulary. 
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are in the range of three to six raw score points. 

There is some instability in the equating constants themselves. 
This error source is minor, and is of the order .02 of a log ability 
unit. This error source will be developed fully in the last section 
of this chaoter. 

A third type of instability is in the assignment of abilities to 
raw scores. This is a major probleni and called "stability of Rasch 
ability parameter estimates." Chapter 2 of this report dealt specifi- 
cally with demonstrating this, stability over various sample sizes and 
sample compositions. Our conclusions drawn from Chapter 2 was that 
sufficient stability is present over types of data sets to allow 
confidence in the equating results. 

There is a fourth error source of importance that we choose to call 
"assignment error". This is the error associated with assigning a 
raw score on one test as equivalent to raw score on another test. If 
v/e were reporting all scores on a comiion log ability scale, then there 
would be no assignment error. Assig^^ment error occurs by having to 
assign a child a raw score on an equated tesi that is most equivalent 
to a raw score on a base test. A hypothetical example follows. 
Consider this partial table. \ 

LOG ABILITY BASE TE^T EQUATED TEST 



10 



-3.2 10 
-3.0' 
-2.8 

-2.6 9 9 

-2.4 8 8 



A child who- receives a 10 on the basfe te^ should be given an ability 
of -3.2. This estimate would contain ei^ror of measurement and a slight 



ERIC 



73 



64 

error of estimating the equating constant. However, he must be assigned 
an equated test raw score of 10, since this equated test score most clearly 
estimates his ability. His assignment error is .2 due to the need to 
assign a raw score. Tables of assignment errors corresponding to each 
equating table are presented in Appendix E and F, 

It is our opinion/that the instability of equating constants is 
inconsequential. Assignment errors are not inconsequential, but are 
not overly severe. Scoring table stability is also not inconsequential ♦ 
Such stability follows from designing tests that meet the assumptions 
of the Rasch model, or by demonstrating empirically sufficient stability 
for existing tests as we did in Chapter 2. The major source of error 
will remaip the usual instability of the individual raw score. The 
assigninent error can be avoided by not using raw score-to-raw score 
equating, but the calibrating raw scores to an ability scale common to 
all instrunients. This latter alternative we have provided in the form of 
cu^ ''Katiopa'J Reference Scale", presented in Volume !I of this report. 

The fine! section of this presentation is a developement of 
equating constant error estifnation procedures. This final section will 
demonstrate that assignment error and the usual measurement error are 
the error sources of real concern. 
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3>5 Vocabulary Constant Error Variance 

Estimates of equating constant standard errors are based on the 
standard errors of easinesses. The easiness standard errors are combined 
using the usual addition of uncorrelated variances formula 



2 

V(Ea^x^-) = Eai V(xi). 



The procedure to be outlined can be modified for any equating design {in« 
eluding ability equating), but this discussion will follow our difficulty 
equating design involving the .7x7 difference matrices, ' 

Let d. be the easiness estimates and V{d-i) be the corresponding 
variances from the Wright-Panchapakesan analyses. We start equating by 
averaging the d^'s to yield d/s for each test in each of the cells in 
tr.e 7x7 matrix. The variance of d. is 

1 

V(d.) = EV(d.); 

where k is the nur^.^er of items. There are two d.'s in each cell of the 
matrix. The difference between these two d.'s has a variance 



i:V(d .) . EV(d J 
V(d.^ - ;d.2) = — 2 — — Z~ 



nenote each of these va 



empty. Parallel forms 



lues by V^j, where the subscri/f)ts indicate the 
cell in the 7 X 7 matrilx. (Note: The diagonals of^these matrices are 



/ 



:onstants are treated separately.) We now 



/ 



ottdin row and column averages parallel to our averaging process in the 
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equating procedure. 

7 

Row average variance = — ij t 

49 

1 7 

V V = V 

Column average variance = . , ij -j 

49 J-^' 

Following a change of sign of one of the sets of marginal averages, 
the corresponding row and column values are averaged. The variances of 
these averages are , 



1 

V = (V + V .), j- = 1, 7. 

J i\ J • • J 



One of the V- corresponds to the reference test STEP. The STEP constant 

J 

is subtracted from each of the other constants to yield equating constants, 
The variance of these constants is 

v% V . + VsTEp 

The "eaaerwill note from this result that the decision to reference 
scales on STEP introduces a small, but unnecessary source of variation. ■ 
That is, one does not need to choose a reference test, but'could choose: 
the overall average origin. If this latter choice is made, then 
ir,'->tead of V* would be the solution at this point. 

Some equating constants are averaged over two grades (in cases in 
which a level is used .in more than one grade). In these cases the two 
's are added and the sum is divided by four. STEP has only one form 
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which is used in three grades. These three V 



STEP 



*s are added and 



the sum is divided by nine. 



The general solution just presented involves the assumptions that 



the standard errors of easinesses are reasonably accurate, that the 
items are locally independent, and that the groups of examinees in the 
7x7 matrix design are experimentally independent. These independence 
conditions are not strictly met, since cell entries are used in both 
row and column averaging and some data appear in more than one 7x7 
matrix. This failure to achieve full independence of terms is not seen 
as affecting numerical values greatly; therefore, calculations are 
carried out as if all terms are experimentally independent. 

The input data, the easiness standard errors, will normally be 
obtained from the Wright-Panchapakesan analysis. However, they can be 
obtained from direct empirical techniques as was done in Chapter 2. 

3.6 Parallel Forms Constants Error Variance s 

Standard errors for parallel forms constants are based on two- 
sample studies. Vie have a V|2 and a V2i > corresponding to the two 
orders of administration. The two pairs of mean easinesses are averaged, 
v/h1ch corresponds to the variance of 



for each of seven parallel forms. A final equating constant consists of 
adding the parallel form constant to the primary form constant to yield 
a final variance of 



(Vi2 + V2i) 



4 



V 




where the subscript j is omitted from the right hand tern. 
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3.7 Comprehension Constant Error Variances 

Comprehension equating constants can be estimated in the same manner 
as vocabulary equating constants. If the comprehension equating constants 
are estimated in that way, then error variances can be estimated in the 
manner previously described. However, comprehension constants were- 
estimated by base file comprehension-vocabulary average difficulty 
differences. Thus, the procedure used all data on a particular test- , 
form-level combination. The adjustments are the differences in averages 
^of difficulties between vocabulary items and difficulty items when total 
tests are calibrated. The error variance of this adjustment is 

'2V(di) . 2V(di) 

V 



A simple summation of the V(d^-) over all items would not reflect differ- 
ences in lengths of the two subtests. 

The v-c difference is added to the vocabulary constant to yield the 
comprehension equating constant on the STEP vocabulary scale. Thus, final 
error variances "^or comprehension equating constants dre the sums of the 
corresponding V^,_^ and Vj. 

3.8 Crude Estimates of Error Variances 

The variance of the easiness of item i is approximately 



V(d^.) : 



N 



-1 

, where N is the sample size. 



Since p and q range from 0 to 1 and p + q = 1 , the values of pq 

do not vary much. Almost all pq values will be in the range of 1/9 to 

* -1 
» • "1/4. If we denote by c the average value of (pq) > then c can be 

ERIC ^ ^ 



69 



estimated crudely to be in the range of 4 to 9. Therefore, we can 
estimate Vld^) quite reasonably to be c/N, where 4<c<9. 

For example, p's usually exceed .13. Foi; p = .13, pq = .11 and 
0 = 9. The maximum p is .5, or.pq = .25. For the maximum p, then, 
c = 4. Thus, we obtain 



V(d.) = c/N 



Then V(d.) = -~ (kc/N) = c/kN. 



And v., 



2c 

kN.. 



or, if the k's are not equal. 



The row and column values V^. and V . are 



y. 



49 



7 

s 



1 



and similarly for V j. 

Finally, we would have and + V^.j.^p to calculate. - 

If we take reasonable estimates of c, k, and N, we can get reasonable 
estimates of the equating constant standard errors on the average. Take 
k = 40 and N = 750 for all instances, and c = 9 to yield overestimates 
of the standard errors of easinesses. 
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We have V(d.) = 9/750 = .012, and 
V(d.) = 9/(40)(750) = .0003 



Each V. . is (9/750) (2/40) = .0006 



And V , = 



z 



0 49 750 



40 



6 



49 



(.0006) = .0000735 



Also V. = .0000735 

I • 



^ 2 (.0000735) = .0000367 
^ 4 



And the variance of an equating constant is 
X + Vj^^p = .0000735 " ^ 

yielding a standard error crude estimate of .0086. 

Since c = 9 is conservative, we can take .01 as a reasonable estimate of 
the standard error of a vocabulary primary form equating constant. 



3.9 Results of Applying Equations 

The Vocabulary equating constants standard errors were estimated 
by the procedures outlined. The following table (Table 3.9J shows these 
standard errors for all primary and secondary Vocabulary tests. 

Table 3.9.1 shows that the formula oslir.ates of the standard errors of 
primary forms (odd numbered DSN's) are all less than the crude estimate 
of .0U86. 



TABLE 3.9.1: VOCABULARY EQUATING CONSTANTS STANDARD ERRORS 





TEST NAME 


SE 


TEST NAME 


SE. 



CAT 3-A 


.0046 


MAT E-F 


.0071 


CAT 4'-A 


.0057 


MAT E-G 


.0090 


CTBS 2-Q 


.0076 


mi I-F 


.0048 


CTBS 3-Q 


.0089 


MAT I-G 


.0059 


ITBS 10-5 


..0046 


STEP 4-A 


.0042 


iTBs n-5-- 


.0059 


STEP 4-B 


.0059 


ITBS 12-5 


.0075 


SRA B-E 


- ;0045 — 


MAT E-F 


.0091 


SRA B-F 


.0057 


MAT I-F 


.0073 . 


SRA G-E 


.0073 " 


STEP 4-A 


.0085 


SRA G-F 


.0090 


SRA BL-E 


.0059 


SAT I-W 


.0071 


SRA GR-E 


.0080 


SAT I-X 


.0094 


SAT I-W 


.0076 


SAT I I-W 


.0048 


SAT I I-W 


.0036 


SAT I I-X 


.0050 
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The secondary forms (even numbered DSN's), also yield formula 
stardard errors considerably smaller than the crude estimates. The 
crude 'estimates of parallel forms equating constant error variances are 

• _2_ 

.0000734 + 4 (.0006) = .000373,,.- 
yielding standard errors of .019, or approximately .02. The formula 

c 

estimates are aU beJow .01-, but each is somewhat larger than its 
corresponding primary form constant^ 

3.10 Coinmen.l^s on Errors 
The basic^data for standard error equatings are the easiness 

sta'^dard errors frorii the original Rasch analyses. Empirical standard 
"errors, obtained" by calculating easinesses on multiple samples, yield 
similar results. Thus, it is apparent that the error sources of concern 
are the usual neasureinent errors and assignment errors. However, 
assignment errors can be avoided by using the reference scale. 

3.11 Equatini Erv^ors fo r the Ability Method 

The arbi trdrMiess of choosing the difficulty method over the 
ability mett»od is further docuniented by comparing the squared standard . 
errors of abil'jry dnd easiness parameters. Both squared standard 
errors are of the same form (Wright and Douglas, undated, pp. 4-7): 

For easinesses, the sum is taken over persons; whereas, for abilities, 
the sum is takr-^n over items. In either case, we can use the estimates 



previously discu'TG^^^d . Take 4 < c < 9 as ct reasonable set of estimates 
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of the average of (pq)"^. 



Thus, the error variance for an item is estimated to be c/N. 
Whereas, the error variange for an ability is estimated roughly to 
be c/k, where k is the number of items on one of the tests. The 
determination of equating constants begins by averaging the easinesses 
within each set of items or by averaging the abilities. In the diffi- 
culty method we get c/Nk^ and c/Nk2 for the crude estimates or the 
error variances of the two averages that are used to calculate initial 
entries in the 7 x 7 tables. However, these values are identical to 
what we get for crude estimates of tho error variances for mean ability. 
Both error variances are functions of N, k^ , and k^. The crude estimates 
differ only' in the order of operations. 



Chapter 4 

Equipercentile and the Rasch Model: A Comparison of the Results 

This chapter is concerned wi+h the similarity of results of the 
equating of nonparallel tests which were praduced by the Equating Phase 
of the Anchor Test Study and by the use of Rasch Model techniques 
presented in the last chapter. The topics considered briefly here are 
the differences in the methodologies used Jn the two studies, the 
differences in data organ izatior^^d the presentation of the results, 
seme empirical corrparisons based on selected subsamples of the data, 
and a few conclusions derived from this attempt at comparison. 

4 .1' The Meth odologies 

Scores on two lests can be defined as equivalent in many different 
ways - each pothod provir'ing a way of converting the system of units of 
one form to the system of units of the other so that scores derived from 
the two forms after conversion will be directly equivale,nt. 

The resuUs of the ei,uating of reading achievement tests have been 

o 

produced by three equating procedures: linear., equipercentile, and 
Rasch. The linear definition of equating is that two raw scores are 
equi> lent if they are the same number of standard deviation units above 
or below the means of their respective score distributions. The linear 
• met!!»od of equating assumes that two distributions of scores have the 
same basic shape and differ only in their means and variances. Linear 

, 74 

84 



75 

equating is a very close approximation to equipercentile equating when 
the shapes of the raw score distributions are similar; therefore, if 
one is prepared to assume that differences in the shapes of the 
distributions of raw scores c wo fonns are sufficiently trivial so 

that they may br^ disregarded, linear equating may be preferable. Unlike 
equiperce'^ti le eqjating, it is . ' irely analytical and verifiable and 
is free from any errors of smoothing, which can produce serious errors 
in the score range in which the data are scant and/or erratic. 

The equipercentile definition of equating is that two raw scpres 
are equivalent if their percentile ranks are equal. One way of /insuring 
equivalent scores when the distribution shapes are different i^ to 
equate by equipercentile methods, (/enerally, the conversion/of X scores 
to rheir equivalent Y scores will/^e curvilinear, and unde^ such 
circumstances th^ equivalency i/ established by stretching and compressing 
the raw score scale one of 4he forms so that its distribution will 
conform to the shape given by the other form* 

The equaling of "rests using Rfisch theory has been explained in the 
previous chapter. In this procedure, two test scores are defined as 



equivalent if they give ^ise to the sanio ability estimates. R^w score 
equating is accorip! i ■ J by assigning as equivalent that pair of scores 
(from tv/o tests) :or i '-^ich the ability scale difference is a minimum. 
Since the ability scdies resulting from the test calibration process 
have equivalent scale units, only scale origin adjustments are necessary 
to bring the two sets of abilities into a common scale. Rasch equating / 
consists mai^ily of estirating these adjustments (equating constants). ; 
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In the Equating Phase of the Anchor Test Study, two equating 
methods were used. One technical objective of the Equating Phase was 
to conparf^ the equating tables that resulted from the linear equating 
method and the equi percentile method. Between the two, the equi per- 
centile method was judged to be best. One reason for this judgement was 

I 

that the linear method resulted in impossible score values such as , 
negative scores or scores that exceeded the number of items in the 
subtest. 

The equipercentile method v/as ^ippTied by a technique developed 
iV Lord for using all data on a given test. Thts technique is' somewhat 
analogous to our procedure for combining data across various samples. 
Lord's techniqu3 Vi^.s applied twice^for each base, test depending on 
whether the base test was administered first or second. These two 

r 

equatihgs yielded nearly identical results and the resulting two equating 
tables were averaged. This procedure, explained fuV 
Test Study report, is their recommended procedure, 
in rraking the coTparisons included her^, only their 
were used. / 



ly in the Anchor 

For this reason j 
reconiiiiondpd results 



4. 2 Dat a Organiza tions and the P/csentations of'the Equating Results 
The primary results of the Equating Phase of the ATS were tables 
for equating each of the seven reading tests to each of the other six 
tests. Results were presented separately for grades 4, 5, and 6. This 
series of equating tpbles makes it possible to translate a child's score 
on any of the seven bests into an equivalent score on any of the other ■ 
tests appropriate foii his grade level. Thus, results from each grade 
are presented separately. 
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> ^he Rascn equating study, the seven tosto were considered as 
tvtr,vy^i\-pi te^ts by separating ihem into their various forms and 
leve's.. STEP haviOvj only one IqyeT, TTBS having three levels, and the 
ot*^-,r fiv? batteries each having two 1eve1$ yields fourteen priniary 
-^nd secj'^oa"/ fc^^o. The Rasch res*ults are presented with each of the 
fourteen pr"> ,ry ^.r s- usee os a base test v;irh its secondary form and 
. i" -r Ti;-; ^* --■'I d>*^y fo'";;s being equated to it. Therefore, four- 
,-t'>9 'ji'^^ ere presented, each showing the raw scores for the 
L.iie r.e>: r^'.^c^e ec, -.ted raw scores for its secondary forms and the 

- ' -T t'f ^^ ^s" '''' t;;o studies are obvious. 

' ' r xores shov/n 
. . ' \r.. \) rv: c . vii^i grau^ level. Th6 

: 0"':^' /,.;, ^ -a :J scj-j':: for each of the four- 
. '.'^o^c ' ' ■ r:-: ' /:e j-^q^ating results 

' . . ^jb f:. u.^ vv-t ; '>oconda.y fu>^m is 

■ ' i: results of ^^'^ "^abch stjd/.^ 

'''„'vr tre Results 
cr" \z copp^.re the resur.s of the tvvo studies, the following 
3^*CL='djres S i bCt^ Various subscvi^ples v;ere identified such that they 
ve--^ co;?' ?S9d onlv of those subjects who wecg administered a pair of 
t:'>'.^ In d scecihed ^rder. For example, one of these subsainples was 
'V ^ \r' v;ere '^ii^st ddir.inijfe'^^'O'J 'AJ Level 1 Fonn A and then admin- 



[\ ' vjnt strike the rfi^aJer as peculiar that equation of secondary 
rirr'/is necessary; yot, there wore tests in the data base whose 
{,..:oraar7 forn^s v/f^rc 'sufficiently different in difficulty from their 
iririiry ^orrns that they could njt be corfsidered parallel. 
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istered ITBS Level 12 Form 5. For each selected subsample, the first 
test administered Wf s treated as the base test. For each raw score on 
the base test a conditional mean, the raw score equivalent scores deter- 
mined by ATS, and the rav/ score equivalent scores determined by the 
Rdsch project provide three different estimates of an equated score 
Ccnditiop.al nean square residuals from each estimate w^^^^^^lculated, 
Tab"^es 4.3.1 through 4.3.9 provide exoinples of these comparisons. Also, 
the distributions of scores on the base test are presented so that the 
reader will know what the conditional sample sizes are. The conditional 
n:ean squares are r^-:'' calculated for conditional sample sizes of fewer 
t^^n five. 

^'-^e J'ir^t 'e [. 'esents d le-^t ,,\\r both of which fit the Rasch 
rod'-l fairlv well. The tests includcJ in Table 4.3.1 were given to. and 
i£3i7r^.ed for fo'ir:'- graders only. The sanpie included 916 subjects who 
vr given (ro SAT Irr rnodiate I ?0)r: W and then the iTBS Level 10 
Forn 5. Thuse tw; have 63.2, and 57.8?; of their respective slopes 

r/t-Aoen .8 ana 1.2. The estimated results fro^r. the equating studies 
oir for di the ^;}Ost oy t^vo raw score points with most of these differences 
in ine lower halt o- the raw score scale. The root mean squares for the 
uppc-* half 0^^ d^tribution aro ^qual in most cases; however, these 
values in ch? lower p:^^t of the distribution are smaller for the root 
roan squares basod on the equi percenti le results. 

The tests included in Table 4,3.2 were given to and designed for 
fourth grader, ^^nly, also. The sanple included 719 subjects who were 
f;rst given r- ^ M'T ^'oinontary Form F and then the SAT Intermediate I 
Furr. W. "^h'. c.nalys s was .rade to show the co;'parison using the MAT, 
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cne test used as the Anchor Jest in the ATS, and, a test defined as a 
test of good fit in relation to the Rasch Model. The estimated results 
from the equating studies differ by, at the riO'st, three raw score points 
with most of 'he differences in the second half of the raw score scale. 
The root mean squares in this example are equal at only six points in 
the raw score seale; therefore, the values of the Rasch root mean 
squares and the equipercentile root mean squares differ throughout 
the remainder of the distribution with the equipercentile values usually 
being smaller but by varying amounts. 

The tests included in Table 4.3.3 are MAT Elementary Form F an^d SRA 
31 ue Form E admi postered in that Irder to a sample of 871, Although this < 
SRA test was designed for grades 1iour*and five, in this administration 
only fourtn graders were involved ^|)ecduse of its being paired with the 
indicated level of MAT. Hence, fourth grade equipercentile procedures 
differ in this a^ialysis by as much as four raw score prints in the second 
hcilf of the distr:bution. The Rasch'root mean squares are usually larger 
throughout the arialysiN^ wi th the greatest di fferences in these values 
found in the "^ast one-third of the distribuuion. 

In Tables 4./.'^ and 4.3.5, the tests involv^^d were the SAT Inter- 
•^'iiate II orni V.' dnd MAT Intermediate Form F. Both of these tests were 
designed for Lot^- grades five and six. Therefore, in this comparison 
uiir- i2;i SbbjL'Cts, Table 4.3.4 shoy/s the analysis using the fifth 
r'-jdo eq iip,-,' ccntile results and Tabic 4.3.5 shows the analysis using 
the iixth g-idu cr.uipercentil e results. Conditional means and Rasch 
c.iatir.g tab'o:. are identical in both tables. The differences in the 
raw score est^'" r.< s using the Rasch results and the equipercentile results 

89 

i 



I 

' 80 

for the fifth grade are never more than one raw score point. The 
sixth grade comparison using the equipercentile results differ at 
some points with the Rasch estimates by two raw score points. In 
both of these comparisons, the root mean squares for the equipercentile 
estimate are consistently larger than the root mean squares using 
the Rasch estimates. In some instances, these indices are equal, but 
the Rasch root mean sauares in both comparisons are generally smaller. 

Tables 4.3.6 and 4.3.7 show the resu-Us of the analysis of MAT 
Intermediate Form F and Sat Intermediate II Form W administered to 1431 
subjects. In Tablo 4.3.6, the Ri^.^b ^-esults are compared to the recom- 
Handed results for grade five from the ATS, and in Table 4.3.7, the 
rasch results are compared to the recommended results for grade 6. This 
is the same procec jre 'that was applied in Tables 4.3.4 and 4.3.5 except 
that here thp aclr inistration order of the tests was reversed, therefore, 
a aifferent sairole was used. When the fifth grade estimate of the 
equipercentile method was use,! in Table 4..3.6, the raw score predictions 
dirr^r mostly by two and orce by three in the later half of the raw 
scce scale. In n:ost cises when the root mean squares differ, the 
, ' percentile root mean squares are smaller. When the sixth grade 
results fro[r. the enui percentile method was used as the estimated raw 
score, the predicted Rasch raw score differed at the most and only 
occasionally by two raw score points. The root mean squares generally 
aro smaller for the equipercentile results with an average difference 
less than one. 

Table 4.3.8 shows rer,ults from a relatively poorly fitting test 

pair. Table 4.3.8 shows t^e analysis of the results where SRA Green 

I 
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Per-, E and CAT Level 4 Form A were administered to 774 subjects in that 
• rvjer. Both of these tests were designed for grade 6 only; therefore, 
the equated -esuUs form the ATS were obtained from their sixth grade 
tables'. These two tests have only 47.6% and 47.5% of their items 
respectively with slopes within the range of .8 to 1.2. The predicted 
raw scores fron the two methods differ at the most by two raw score 
points viith these being mostly located in the upper part of the raw 
score scale. The root mean squares usually are smaller when the equi- 
percentile predictions are used. 

The cor^parisons in Table 4,3.9 can be described as similar to 
these in Table 4.3.8. Here, CAT Level 4 Form A and ITBS Level 12 
Fonn 5 were adnrmi stered, respectively, to 836 subjects. Since both 
of t^ese rests were designed for ase at the sixth grade level only, 
only $ixth grade equaled scores were available from the ATS. The 
results shDwn in Table 4.3.9 differ somewhat from the other comparisons 
thdi rave betn made, since the largest differences in raw score estimates 
are found the lower part of the distribution. 'Many of the root mean 
SsJctres in tvs cbr.i^^rison are equal for the two predictions; however, 
for those t'-a: differ, the equipercentile root mean squares are usually 

4^4 _ Cone"! ud ; nq Comirents 

A coiuient on the value of the conditional means is appropriate. 
ThL-se in no-way are "correct answers." The conditional mean technique 
13 oest considered a different equating technique that is not used. 
A spt of 'onditional means is identical to the unsmoothed general 
cur /^linear regression of the equated test on the base test- The 
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choice of a regression procedure would require two different equating 
tables for each test pair, depending on which test of the pair was 
regressed on the other. This aspect of regression technique makes it 
an inappropriate one for defining score pairs as equivalent, iince it 
is inconsistent with the uniqueness aspect that equated scores must 
have. 

Moreover, the root mean squares of rpsi duals are also easy to 
ovennterpret. The root inean square of residuals around the -condit-ional^ 
means are always smaller than the corresponding values for the equated 
score-^. This results from a well-known statistical fact— sums of 
squares around a mean are always less than sums of squares around 
any other value. Thus, the comparative sizes of two root mean squares 
do not provide unequivocal compari sons.of results. 

Pe*-haps the best way to evaluate the results presented is a simple 
visual inspection of the two vectors of equated scores. Many of the 
scores for the two methods of equating are not different. Most 
coirparisons d-^^fer by one or two points, and occassionally by three, 
but rarely by four points. In general, the results are strikingly 
similar; what deviatiops there are that do exist are thoroughly 
ccliDsed by th-j rc-^ectivs standard t-rrors of measurement. 



Table 4.3.1 

nasc Test: SAT Int. I F -m W 
B.qii.ncd Test: ITB3 Level li Form 5 



83 



Base Test 


Estimated 


Equated Test Root Mean 


Scores 


Scores 


Squares of Residuals 





Freq 


Mean 


R 


E 


1 


0. 




1. 


2. 


2 


3. 


* 


2. • 


3. 


3 


2. 


•h 


2. 


4. 


4 


3. 




3. 


5. 


5 


11. 


9.5 


4. 


6. 


6 


19. 


9.7 


5. 


7. 


7 


18. 


11.3 


6. 


8. 


8 


29. 


10.0 


7, 


9. 


9 


23. 


11.7 


8. 


10. 


10 


30. 


12.2 


9. 


11. 


11 


33. 


12.4 


10. 


12. 


12 


40. 


12.7 


11. 


13. 


13 


40. 


13.3 


12. 


14. 


14 . 


44. 


13.3 


13. 


15. 


15 


37. 


16.5 


]5. 


16. 


16 


36. 


17.2 


16. 


1. 


17 


32. 


lb. 3 


i;. 


13. 


-» o 




17.4 - 


18. 


19. 


19 


51. 


19. B 


Vj. 


21. 


20 


.*7, 


IV. 9 


21. 


"22. 


21 


40. 


23.5 


22. 


2'*. 


22 


30. 


?3.9 


23. 


24. 


23 


31. 


24.2 


25. 


;5. 


24 


36, 


25.6 


26. 


27. 


25 


3/. 


25.3 


27. - 


2';. 


26 




25.6 


pcj , 


?0 


27 








30. 


2d 


29. 


2v 1 


31. 


31. 


29 


14. 


31 .M 




32. 


30 


33. 


31.3 


33. 


33. 


31 


13. 


30.9 


34. 


34. 


32 


?0. 


32 . 6 


3S. 


35. 


33 


B. 


-^2.8 


36. 


35. 


34 


3. 


34.5 


36. 


36. 


35 


*> 




M. 


37. 


36 


6. 




3"^. 


37. 


37 




* 


37. 


38. 



Kean R 
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* * * 

* . * * 

* * * 

* * * 
3.9 6.4 4.8 
2.7 5.4 3.8 

3.0 6.1 4.5 

3.6 4.6 3.7 

4.4 5.8 4.7 

4.7 5.7 4.8 
4.3 4.9 4.3 
4.9 5.2 5.0 

5.1 5.3 5.2 

4.5 5.1 4.5 

4.8 5.1 4.9 
5.1 5.2 5.1 

5.1 5.3 5.1 

5.5 5.6 5.3 

4.9 5.0 5.1 
5.0 3.1 5.4 

4.2 4.4 4.2 
4.9 5.0 4.9 
4.8 4.9 5.1 

5.3 5.3 5.5 
4.2 4.5 4.9 

4.8 5.8 5.8 

3.4 3.6 3.6 

3.9 4.3 4.3 

3.2 3.3 3.3 

4.6 4.9 4.9 
5.0 5.8 5.8 
2.9 3.8 3.8 
2.0 3.8 3.0 

1.3 2.0 2.0 

* * * 
1.9 2.9 • 2.9 

* * * 



^V.fluo^ v^rt' n-jt conpMtd it-ro friquencies v;ere less tl\'n 5. 



Table 4.3.2 

Base Test: MAT Level E Form F 
Equated Tost: SAT Int. I Form W 
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Base Test 
Scores 



Estlnatcd 
Scores 



Equated Test Root Mean 
Squares of Residuals 





Freq. 


Monn 






Mean 






1 


A 

U • 




1 

X . 


n 


* 


* 






A 
t • 




1 

1 . 


X • 


* 


* 


* 




0 ^ 


^ . J 


1 

1 • 


X • • 


7 5 


10.6 


10.6 


A 
*♦ 


1 

L • 


0 


1 

1 • 


9 


n 

V 


7.0 


6.0 


c 


A • 


iU. J 


1 

1 4 


9 


9 n 




8.2 


o 


Z • 


/ .U 


9 


0 • 


2 0 


5.4 


4.5 


7 


8. 






J. 


1 n 

J« u 




5.5 


O 


3. 


1 li . u 






1 L 

1 . H 


8.1 


6.2 


9 




/I C 


3 . 


/. 

M . 


1. -> 




2.9 


iO 


3. 


0. z 


J . 


c 

J . 


1 n 
1 • u 


S 3 


3.3 


1 1 


10. 


V . 7 


/, 

. 


c 

J • 


^ n 
J* 1/ 


6 5 


5.6 


12 


11 . 


9.5 




/: 

D. 


A A 


7 7 


5.6 


13' 


8 . 


8.8 


M . 


9 


9 7 




3.2 


1 A 




, 0 


5 . 


/ . 




s n 

^ . u 


3.7 


15 


. 


9.C 


• 3. 




9 !5 




3 0 








6 . 


p 
0 . 


J. • Q 


3.4 


1.9 


?. «• 




9. ? 


6 . 


ft 

y . 


9 ft 


H.J 


2.8 






I V. ^ 


*> 

/ . 


Q 


9 ^ 


4.0 


2.6 


1 ) 






7 


. Q 




5.0 


3.7 






ID. 0 


0 

0 . 


1 n 

lU . 


9 Q 


3.9 


3.0 


/ 1 




1 J. i 


q 






2.7 


2.4 




\ 




r. 


* 1 0 




2.3 


1.9 


J 


c 


1 'J 




1 1 

J. J. . 




7.2 


6.7 


/ 1 




J . 0 


1 n 

I M . 


J. JL 4 


2.0 


2:2 


2.0 


^ 1 


y . 




1 1 


1 7 


119 


2.1 


2.6 


J £ 


' . 


i <: . J 




J. — . 


3.6 


3.6 


3.6 






11. ^) 


1 9 




3 6 


4.0 


3.6 






I*.. / 


1 1 

1 ^ . 


1 


J. 


3.2 


3.2 






1 > C 


1 It . 


1 > 
i ) . 




4.0 


3.4 








1 1;. 
l.) . 


1 5 

L J* 




5.5 


6.2 


1 


\ 11 . 


J -4 . 1' 


1 c 


1 '1 


3 8 


3.9 


3.8 


"52 


l-» . 


1 r *> 

15 . 7 


1 c 


1^ . 


J. ^ 


3.5 


3.9 


33 


1 ^ . 


1 * c 
1 ♦ . 


1 7 






3.6 


2.9 


3/^ 


2 ' , 


1 i 


)8. 


15. 


3.3 


4.6 


3.4 








10. 


16. 


3.9 


4.3 


4.0 


36 






~ 19. 


17. 


3.5 


4.9 


3.8 


3" 






?0. 


17. 


3.8 


4.4 


3.9 


3S 


31 . 


18.? 


21. 


18. 


3.0 


4.1 


3.0 


39 




18.5 


r.2 . 


n. 


/ 4.2 


5.4 


4.2 


AO 


30. 


18.6 


23. 


20. 


/ 4.1 


6.0 


4.3 


<* I 




21.1 


JA. 


21. 




4.3 


3.1 


*V ' 




2\M 


25. 


21. 


3.1 


4.8 


3.2 






23.? 


26. 


23. 


/ ' 4.1 


5.0 


4.2 




29. 


24.1 


27. 


24. J 


3.5 


4.6 


3.5 




, 36. 




2^> . ' 


26. 


4.3 


5.5 


4.6 


A6 


*♦'"■>. 


26.2 


29. 


28. 


4.1 


4.9 


4.4 




2o. 


2P. * 


31. 


30. / 


3,R 


4.6 


4.1 




?j) . 


?8.c 


32 . 




3.5 


4.8 


4.8 




10. 


29. A 






2.5 


5.3 


5.3 



/ 
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Table 4.3.3 

Base Test: MAT Level E Form F 
Equnced Test: ' SBA Blue Form E 
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Base Te .t Esclmated Equated Test Root Mean 

Scores Scores Squares of Residuals 



Freq 



Mean 



Mean 



JL 


V 




X 


n 


* 


* 






A 

u 




X 


X 




* 






L 


o . -) 


1 


X 


* 






A 


1 
X 


o • u 


1 

X 










c 
D 






1 
X 


0 






* 


6 








A 




it 




f 


o 

Q 


1 A 


9 




'K 0 


6.A 


A.O 


Q 

O 




inn 
xu • u 


. 0 


C 
-J 






* 


Q 




o • ^ 




A 








1 

iU 


e 


ft 7 


J 


7 


X • V 


5.3 


1;5 


JLX 


Xa/ 


Q 7 


A 


7 




6*5 


A.l 


1 9 


1 1 

XX 


* Q «i 
I* • J 


A 
*♦ 


3 


A. A 


7.1 


A. 7 




o 
o 


0 • o 


c: 


ft 

\j 


2«7 


A«6 


2.8 


A** 




Q n 


J 


q 


3 . 1 


5.0 


3.1 


1 ^ 


q 


^ » • » 


A 

0 


Q 


2 .8 


%.l 


2.8 






Q n 


A 


1 0 


1*6 


3. A 


1.9 


1 7 1 


i ^ 


Q 9 


7 


1 n 


2 ,8 


3.6 


2.9 


i 0 1 






7 




2 . 3 


3.9 


2.3 


1 

J 7 


7 


1 n o 


(5 




3 2 


A. 3 


3.2 




c t 


1 i . A 


7 


1 1 

X I. 




3.3 


2.9 


1 


1 L 


1 f"i 1 
iU. J 




■1 1 
XX 


? A 


2*7 


2,5 




\ 


in 1 


J. u 




A 








8 


X'* . ^ 


1 t 

i L 


1 V 

x^ 


S 8 


6.7 


6.2 


*i 

2«t 


1 1 


1 A 0 

lU. .5 


ii 


1 9 


*> n 


2.0 


2.3 


^ c 


V 


in 5 
Xu. 1 


1 o 


1 

XJ> 


X • 7 


2.6 


3»3 


'1 - 










A 


3.7 


3.7 


2 ? 


5 


11.0 


xA 


1 A 
X<l 


J • *♦ 


A S 


A. 5 


A. ' 


11 


12. 7 


IS 


i'l 


9 




3.5 


29 


ir 


11. 5 


15 




1 

J • X 


A , 7 


^.0 


3 J 




15.9 


io 


1 r 

i-> 


p • *♦ 


5. A 


5.5 


31 


11 


lA .0 


1'^ 


i J 


'4 ft 
J • O 


A ft 


3.9 


22 


1 '* 


15. 7 


IS 


J 0 


J • J 


A 1 

H.J. 


3.5 


33 


1 > 


lA.ft 


19 


17 


2.9 


5.1 


3.6 




/ >■ 


Is. 9 


20 


*! 


3.3 


6.1 


AsO 




I'i 


17.1 


21 


lb 




^ A 


\ q 




1 \ 




23 


19 


3.5 


8,2 


A. 9 


> ; 




:7.o 


?A 


20 


3.8 


7.2 


A. A 


> > 






25 


21 


3.0 


7*A 


A.l 


3^ 






26 


22 


A. 2 


8.6 


* 5. A 


AO 






27 


2A ' 


A.l 


9.3 


6.8 


41 


:o 




29 




3.1 


8.5 


5.0 


^2 


32 


■21. A 


30 ' 


27 


3.1 


9.1 


6.A 


Ai 


A J 


?'5.2 


32 


29 


A.l 


9.7 


7.1 


/^A 


:r> 


2A.1 


33 


31 


2.5 


9.6 


7.8 


^♦S 




2A.6 


3A 


-^3 


A. 3 


10.3 


9.A 






2(». 2 


36 


35 


A.l 


10.6 


9.7 


A"? 


26 




37 


37 


3.8 


9. A 


9. A 




?f 




39 


39 


3.5 


10.8 


10.8 


A 9 


iO 


2V.A 


AO 


AO 


2.5 


10.9 


10.9 




vt rf r.ot c< 






i?3 wre lcsr» 


than 5. 


r 





erJc 
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Table 4.3.4 

Base Test: SAT Int. II Form W 
Equated Test: MAT Int. Form F 



86' 



Base Test 
Scores 



Estimated 
Scores 



Mean 



Equated Test Root Mean 
Squares of Residuals 



Mean 



1 


0. 




L . 




2 


z. 




A 

*♦ . 


A . 


3 


5. 


y . u 


r 

0 . 


A 
u . 


A 


10. 


13 . J 


0 

0 . 


7 


5 


16. 




lU. 


Q 


6 


20. 




1 1 
IX . 


in 


7 


27 . 








8 


27 . 


1 *J Q 


1/1 




9 


35. 


1 ^ Q 




1A 


10 - 


AA. 


17 .6 




'i c 
10 . 


11 


AA . 


19 . 1 




1 7 


12 


60. 


18. 3 


on 


1.7 . 


13 


A8. 


19. A 


Zi. 


91 


lA 


, A6. 


21.9 


LI . 




15 


60. 


23 . 3 






< 16 


AS. 


23.0 


25. 


25. 


17 


50. 


26.1 


26. 


27 . 


18 


A8. 


27.8 


27. 


9Q 


19 


37. 


27 .9 


00 

. 


90 


20 


A5. 


28. A 


2^ . 


JX. 


V 


37. 


29.3 


30. 


19 


22 


35. 


31.5 


31. 




23 


/AA. 


32.2 


32. 


"7 A 


2A 


37. 


32.8 


33. 




25 


A6. 


3A.1 


3A. 


3d . 


26 


' 35. 


3A.6 


35. 


J/ . 


27 


3A. 


36.6 


36. 


Jo . 


23 


33. 


37.1 


JO . 


29 


22. 


37.3 


38. 


'iO 

. 


30 


28. 


37.7 


IS. 


AO. 


31 


20. 


38.0 


39. 


Al. 


32 


35. 


AO. 2 


AO. 


Al. 


33 
3'* 


2A. 
30. 


38.6 
^1.8 




A2. 
A3. 


35 


15. 


AO. 7 




AA. 


36 


13. 




A'J. 


AA. 


37 


20. 


AA.6 


A'*. 




3S 


10. 


A3. 3 


AS. 


A6. 


39 


13. 

12; 


A5I2 


A5. 


A7. 


AO 


A6.0 


A6. 


A7. 


Al 


9. 


A6.9 


A6. 


A8. 


. A2 


6. 


A7.5 


A7. 


A8. 


A3 


A. 


* > A8. 


A9. 


A A 


5. 


A6.a 


A8. 


A9. 


A5 


1. ' 




A9. 


50. 


A6 


I. 




A9. 


50. 


A7 


3. 


.* 


- A9. 


5Q. 

/ 



2. a 



\ A. 8 
' 5.3 



5.3 
A. 2 
3.9 
5.2 
6.2 
A. 9 
5.6 
6.0 
5.3, 
6. A 
5.8 
6.*. 
6.1 
5.5 
6. A 
5.1 
A. 7 
5.5 
6.1 
6. A 
A. A 
6.7 
A. 7 
3.8 
A .2 
6-1 
3.9 
3. A 
2.9 
A .1 
2.7 
2. A 
2.8 

2.i 

2.0 
2. A 
1.6 

•k 

1.5 



^Vilucrf wtTf rot computed where frequyncios were Ic^s than 5. 
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* 

A.l 

6.3 

6.7 

6.1 

A. 3 

5.2 

5.2 

6.2 

A. 9 

5.8 

6.2 

5.3 

6. A 

6.1 

6. A ^ 

6.1 

5.5 

6. A 

5.1 

. A. 7 
5.5 
6.1 
6. A 
A. A 
6.7 
A. 7 ' 
3*8 
A. 2 
6.2 
3.9 
A.l 
2.9 
A.J 
2.7 
2.5 
3.2 
2.2 
2.0 
2.6 
1.7 
* 

1.9 
* 



A.l 
7. J 
7.5 
6.6 
A. 6 
5.8 
5.9 
6. A 

5. A 
5.6 
6.2 
5.3 

6. A ^ 
6.1 
6\A 
6.1 
5.6 
6.9 
5.8 
5.0 
5.8 
6.5 
6.7 
5.0 
6.8 
A-.8 
A.l 
A. 8 
6.9 
A.O 
A. 6 
3.2 
5.3 
2.8 
2.4 
3.9 
2.9 
2.3 
2.7 . 
1.7 

2.6 
* 

* 
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Table 4.3,5 

Bat^t^ Test: SAT Int. II Form 
Equated Test: MAT Int. Form F 
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Base F^tlnv-xto' Equated Test Root; Mean 



Scores 




ceres 




■> 


Squares 


of Resl uals 










R 


E 


Mean 




E 


1 


0. 




2. 


0 

1. 




* 


it 


2 


2. 


* 


4. 


2. 


* 




It 




✓ • 


9 






2.8 


4*0 


4 .8 




10, 


13*3 


8. 






6.3 


7.1 


c 


16. 


1 f< 

■I > 


1 0 






6.7 


7 S 


A 

1 


2 }. 


« « ' 


1 1 


1 0 

XV/ < 


5.3 


6.1 


6.6 


7 


>7 


J. 3 . o 


X J . 


XX . 


2 


4 3 




0 


^ f t 


1 / . > 




1 9 


Q 

J. 


«i 9 


0.0 






io. o 




m • 








i J 


. 


1 7 

J ' 0 


"I 7 


1 Q 

i-? . 


" 9 


fi 9 


6^7 

r 


1 t 
' * 




10 1 


\ r\ 




A Q 


A Q 


S ft 


I - 




1 Q 




'i ft 
xo . 




*i ft 






* 






i f * 


0 . V/ 


9 










\ ' 
• • 


\ 


3. ^ 


^ . ^ 


-/ . J 












A 


6 4 


6.4 


J 








9 \ 


. 0 


U . X 


5 ft 


1 ' 








OA 


^ A 


6 4 


6.4 




' o 

I 






?1 


1 


6.1 


6.1 






* ; (* 






5. S 


5.5 


5.5 












u • •? 


6.4 


6.6 










)1 


c: 1 

. X 


^ * X 


5.4 






) • ^ 






4. 7 


4.7 


4.7 










^ * • 


3.5 


5.5 


5.6 


> , 


-> • 






34. 


6.1 


6.1 


6.2 


) 


A 


t > 1 




. 


6.4 


6.4 


6.5 




• 


1 ♦ 


^ ^ . 




4.4 


4.4 


4.6 


/ / 




' ^ N r, 




•J / . 


6.7 


6.7 


6.7 




I i 


/ . ^ 


7 


00 . 


4.7 


4.7 


4.8 






V . . 






3.8 


' 3.8 


4.1 






^ 7 . 






4.2 


4,2 


4.8. 


.' 


^.'^ 

.•.V * 






40. 


f» 1 


6.2 


6.5 


) « 








•^1. 




3.9 


4.0 


> 

> 






^ 


42. 


3,4 


4.1 


4.8 








4'^ > 


43. 


2.9 


2.9, 


3.2 










43. 


4.1 


4.3 


4.7 


^ '> 




'♦3 2 


4 J. 


44. 


2.7 


2.7 


2.8 




:o» 


<Vt . 6 


a;. 


45. 


2.4 


2;5 


2.4 




10. 






45. 


2.8 


3.2 


3.2 




n. 


43.2 




46, 


2.2 


2.2 


2.4 




] ^ . 






47. 


2.0 


2.0 


2.3 








'.^ . 


47. 


2.4 


2.6 , 


2.4 




6. 






43. 


1.6 


1.7 


1.7 


f 




V 




*<9. 




■k 


A 




5. 






49. 


1.5 


1.9 


2.6 


/ , 


1. 




V). 


50. 




* 


* 




I. 






50. 


* 


* 


* 




1. 






50. 




* 


* 










'< IPS VCX"') 


Ir^^ than 5. 
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Table 4.3.6 

o 

Base Test: MAT Int. iorm F 33 
Equated Test,: SAT Int. II Form W 



Base Test Estimated " Equated Test Root Mean 

Score? Scores Squares of Residuals 





Freq. 


Mean 


R 


E 




Mean 


R 


E 


1 


2. 


8.0' 


1. 


0. 






* 


4c 


2 


0. 




1. 


0. 




it 




4c *■ 


3 


1. 


13.0 


1. 


1. 




it 


4c 


vc 


A 


2 • 


6.0 


2. 


2. 




* 


4c 


IC 


3 


1. 


5.0 


2 , 


2. 






^ 


4c 


6 




9.3 


3. 


3. 




* 


4c 


4c 


7 


\'- 


10.3 


3. 


4 • 




it 


4c 


4c 


8 




11.5 


4. 


, 5. 






8.7 


T ft 

7.9 


9 


10. 


7.7 


5. 


5. 




2 .9 


3.9 


3.9 


10 


7 . 


11.9 


5. 


6. 




3.6 


7 .8 


C ft 

0.9 


11 


IS. 


8.9 


6. 


7. 




2.9 


4 .1 


3.5 


12 


25. 


10.3 


6. 


7. 




3.4 . 


c c 


M . / 


13 


32. 


11.4 


7. 


8. 




2,7 


5.1 




14 


3i. 


10.5 


8. 


9. 




3.2 


M .U 


J.J 


15 


:Vi. 


10.8 




10. 




2.3 


J . D 


*> /. 
Z .M 


16 


*'tj . 


12.5 


9 . 


10. 




3.3 


/. Q 


A 1 
M . 1 


■• *> 


39. 




10. 


11, 




4.2 


/. /. 
if .M 


/. 7 


13 


29. 


12,5 


11. 


12. 




3.1 


J .5 


Q 7 
J . Z 


19 


31. 


12.7 


11. 


12. 




3.9 


M .J 


A rv 
M . u 


20 


29. 


14.0 


12. 


•13. 




3.6 


M . U 


J . / 


21 


21. 


14.5 


] 3. 


1j. 




3.8 


/. 1 

H . 1 


/. 1 




39. 


14.5 


14 . 


14. 




3^2 


J .J 


J . J 




35. 




14. 


15. 




3.8 


^ . I 


7 0 
J . 7 




35. 


1 S . 3 


1 5. 


15. 




4.2 


4.3 






3'^. 


I ; . 1 




16, 




4.2 


4.3 




26 


41. 


17.^ 


17. 


16. 




4.0 


4.0 


4,1~ 


27 


47. 


17.9 


13, 


17. 




4.6 


4.6 


4.7 


28 


38. 


20.2 


19.. 


. 18. 




4.5 


4.7 


5.0 


29 


50, 


19.3 


20. 


19. 




5.1 


5.1 


5.1 


30 


40. 


. 19.7 


21. 


20. 




4.1 


4.3 


4.1 


31 


40. 


?1.4 


22. 


20. 




4.8 


4.9 


5.0 


32 


4A. 


20.4 


23. 


21. 




4.8 


5.4 


4.0 


33 


42. 


22.8 


24. 


22. 




4.6 


4.8 


4.7 


34 


4'*. 


23. 3 


J5. 


23 . 




0.3 


0.3 


O.J 


35 


47. 


24-. 2 


26. 


24. 




5.3 


5.6 


5.3 


36 


57. 


26.7 


27. 


25. 




4.6 


4.6 


4.9 


37 


33. 


27.3 


28. 


26. 




4.8 


4.8 


5.0 


38 


49. 


27.7 


29. 


28. 




5.0 


5.1 


5.0' 


39 


, 48. 


29.6 


31. 


29. 




4.8 


5.0 


4.9 


AO 


46. 


30.0 


32. 


30. 




5,0 


5.4 


5.0 


41 


/ 


3'^B 


33. 


32. 




5.3 


5.3 


5.4 


42 


; 37. 


P.I 


3'». 


33. 




4.0 ' 


4.4 


4.1 


'.3 


/ . 


" 3?. 7 


3'"). 


34. 




4.2 


5.3 


4.4 


'.4 




34.^ 


37. 


36. 




5.3 


5.8 


5.5 


4S 




37. '> 


-^9. 


37. 




4.7 


5.0 


4.7 


46 


3/. 


3;. 7 


40. 


38. 




3.6 


4.3 


3.7 


47 


•9. 


3 '.B 


4?. 


40. 




4.1 


5.9 


4.7 


4«^ 




y .s 


4^». 


41. 




5.0 


7.5 


5.6 


49 


il. 


41, S 


4^. 


44. 




2.8 


5.3 


3.7 



frequencies were less than 5. 



Table 4.3.7 

Base Test: MAT Int. Form F 
Equated Test: SAT Xnt, II Form W 
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Base Test 
Scores 


Estimated 
Scores 


Equated Test Koot Mean 
Squares of Residuals 





Freq. 


Mean 


II 


JB 


Mean 






1 


2. 


8.0 


1. 


1. 


1 


7.1 


7.1 


2 


0. 




1. 


2. 


* 


* 


* 


3 


1. 


13.0 


1. 


2. . 


0 


12.0 


11.0 


4 


2. 


6.0 


2. 


3; 


2.0 


4.5 


3.6 


5 


1. 


5.0 


2. 


' 3. 


Qj 


3.0 


2.0 


6 


3. 


9.3 


3. 


4. 


6.2 


8.9 


8.2 


7 


' 4, 


10.3 


3. 


4. 


3.4 


8.0 


7.1 


8 


6. 


11.5 


4. 


5. 


4.4 


8.7 


7.9 


9 


10. 


7.7 


5. 


6. 


^ 2.9 


3.9 


3.3 


10 


7. 


11.9 


, 5. 


6. 


. 3.6 


7.8 


6.9 


11 


18. 


8.9 


6. 


7. 


2.9 


4.1 


3.5 


12 


25- 


10.3 


6'. 


8. 


o3.4 


5.5 


4.1 


13 


32. 


11.4 


7, 


9. 


2.7 


5.1 


3.6 


14 


31. ^' 


10.5 


8. 


9. 


3.2 


4.0 


3.5 


15 


28. 


10.8 


8. 


10. 


2.3 


3.6 


2.4 


16 o 


26. 


12.5 


9. 


11. 


3.3 


4.8 


3.6 


17 


39. 


11.2 


10. 


12. 


4.2 


4.4 


4.3 


18 


29. 


12.5 


11. 


12. 


3.1 


3.5 


3.2 


19 


31. 


12.7 


11. 


13. 


3.9 


4.3 


3.9 


20 


29. 


13.7 


12. 


14. 


3.6 


4.0 


3.6 




21. 


14.5 


13. 


14. 


3.8 


4.1 


3.9 


22 


39. 


14.5 


14. 


15. 


3.2 


3.3 


3.3 


23 


35. 


15,8 


14. 


15. 


3.8 


4.2 


3.9 


?4 ' 


35. 


15.3 


15. 


16. 


4,2 


4.3 


4.3 


25 


3/*. 


17.1 


16. 


17. 


4.2 


4.3' 


-4.2 


26 ' 


41. 


17,0 


17. 


17. 


4.0 


4.0 


4.0 


27 


47. 


17.9 


18. 


IB. 


4.6 


4.6 


4.6 


28 


38. 


20.1 


19. 


19. 


4.5 


4.7 


4,7 


29 


50. 


19.3 


20. 


20. 


5.1 


5.1 


5.1 


30 


49. 


19.7 


21. 


20. 


4.1 


4.3 


4.1 


31 


40. 


' 21.4 


22. 


21. 


4.8 


4.9 


4.8 


32 


44. 


20.4 


23. 


22. 


4.8 


5.4 


5.0 


33 


42. 


22.8 


24. 


23. 


4.6 


4.8 


4.6 


34 ■ 


44. 


23.3 


25. 


24. 


4.3 


6.5 


6.3 


35 


47. 


24.2 


26. 


25. 


5.3 


5.6 ^ 


5.4 


36 


57. 


26.7 


27. 


26. 


4.6 


4.6 


4.7 


37 


35. 


27.3 


28. 


27. 


4.8 


4.8 


4.8 


38 


49. 


27.7 


29. 


28. 


5.0 


5.1 


5.0 


39 


48. 


29.6 


31. 


29. 


4.8 


5.0 


4.9 


AO 


46. 


30.0 


32. 


31. 


5.0 


5.4 


5.1 


Al 


. 34. 


32.8 


33. 


32. 


5.3 


5.3 


5.4 


A2 


' 37. 


32.1 


34. 


33. 


4.0 


4.4 


4.1 


A3 


30. 


32.7 


36, 


35. 


4.2 


5.3 


4.8 


AA 


40. 


34.7 


37. 


36. 


5.3 


5.8 


5.5 


A5 c 


32. 


37.4 


39. 


38. 


4.7 


5.0 


4.8 


A6 


37. 


37.4 


40. 


^ 39. 


3.6 


4.3 


3.9 


A7 


19. 


37.8 


42. 


41. 


4.1 


5.9 


5.2 


A8 


17. 


38.5 


44. 


42. 


5.0 


7.5 


6.2 


A9 


11. 


41.5 


4^6. 


44. 


2.8 


5.3 


3.7 



O )luos vxiro not computed where froqiicncics were lens than 5. 
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Table 4.3.8 

Bnce Tost: SRA Green Form E 
FquTted Test: CAT Level 4 Form A 
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! 


Base Test 
Scores 


Estimated 
Scores 


. / 


1 

Equated Test Root Mean | 
Squares of Residuals ! 

i 



*VaUios> were 



outt(^ "^hi^ro frequcT^. ICS wore loss than 5. 





Freq. 


Mean 



R 

— 


E 

— 


Mean 


R 


lE 

'■■ ~ 


1 


0. 






1. 


* 


* 


1 * 


2 


0. 




2. 


1. 


* 






3 


0. 




2. 


2. 


* 




1 « 

i 


4 


2. 




3. 


2. 


* 




; * 


5 


5. 


!).4 


4. 


3. 


2.4 


2.8 


|3.4 


6 


1. 




4. 


4.' 


* 


it 


{ * 


7 


3. 




5. 


5. 


* 


it 




8 


3. 


9.8 


6. 


6. 


3.5 


5.2 


' 5.2 


9 


ID. 


10.2 


7. 


7. 


3.5 


4.7 


1 4.7 


10 


20. 


8.6 


S. 


8. 


4.2 


4.2 


4.2 


11 


11. 


10. S 


B, 


9. 


3.7 


4.4 


4.0 


12 . 


22. 


11.3 


9. 


10. 


4.9 ' 


5.4 


5.1 


13 


IS. 


9.6 


10. 


n. 


3.7 


3.8 


4.0 


14 


13., 


12.7 


11. 


12. 


4.1 


4.3 


4.2 


15 




12.9 


12. 


13. 


- 4.5 


4.6 


4.5 


16 


11.^ 


13.3 


; 2 . 


^4. 


3.8 


4.0 


3.9 


17 


n. 


14.3 


13. 


14. 


3.8 


4.1 


; 3.8 


18 


23. 


13.1 


14. 


15. 


3.6 


3.7 


1 3.6 


19 


21. 


16. 5 


15. 


16. 


3.5 


3.8 


I 3.6 


20 


22. 


16.9 


16. 


17. 


4.0 


4.1 


4.0 


21 


/ 20. 


16.1 


17. 


18. 


4.1 


4.2 


4.5 


22 


i 2^. 


19.6 


17. 


13. 


3.6 


4.4 


3.9 


2 < 


16. 


19.8 


18. 


19. 


3.3 


3.8 


3.4 


.?4 


2^. 


19.8 


19. 


20. 


4.6 


4.6 ^ 


4.6 


25 


20. 


20.3 


20, 


21 . 


5.2 


5.2 


i 5.6 


26 


19. 


21.4 


21 . 


21. 


3.9 


4.0 


4.0 


27 


24. 


22.8 


22. 


22. 


4.1 


4.1 \ 


4.1 


28 


22. 


23.3 


23. 


23. 


3.7 


3.8 


3.8 


29 


35. 


24.2 


24. 


24. 


3.5 


3.5 


, 3.5 


30 


2^. 


24.2 


25, 


25. 


2.6 


2.7 , 


\ 2.7 


31 


28. 


24.9 


26. 


25. 


2.9 


3.1 


\ 2.9 


32 


JO. 


2-). 8 


27. 


26. 


4.3 


4.5 


\ 


33 


27. 


27.0 


2B. 


27, 


'2.8 


2.9 


\ 2.8 


3'. 


v^. 


27.8 


:9. 


28. 


2.8 


3.0 


\ 2.8 


•^5 


?'). 


2>^.4 


30. 


29. 


3.4 


3.7 


\3.4 


36 


'3/. 


2^.9 


31. 


30. 


4.0 


4.5 


\4.1 


37 


32. 




3 2. 


31. 


2.4 


3.5 


2.9 


38 


25. 


31.2 


34. 


32. 


2.7 


3.9 


2.8 


39 


3?. 


31 . 2 


35. 


33. 


2.8 


4.7 


3.3 


40 


23. 


32. S 


36. 


34. 


2.0 


3.7 


:^.3 




1^*. 


33.1 


23. 


36. 


3.0 


5.7 


4^2 

\ 

\ 
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Table 4.3.9 

Base Test: CAT Level A Form A 
Equated Test: ITBS Level 12 Form 5 
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&ase lesc 
Scores 




Estimated 
Scores 




Squares 


Test Root Mean 
of Residuals i 






r rc*C| . 


Mean 




E 


' Mean 

1 ' 


R i 




1 
1 


1 

1. • 


* 


1. 


3. 


* 




* 


2 


0. 


* 


2. 


A. 


* 




* 


•J 


A 
V • 


* 


3. 


5. 




* ! 




/ 


1 

1 • 




4. 


6. • 


* 




* 


c 
D 




13.3 


5. 


8. 


1.6 


8.5 


5.6 


c 
0 


1 c 
i J • 


13.3 


6. 


q 


3.2 


7.9 


5.3 


7 


7 


10.6 


8. 


10. 


3.5 


4.3 


3.5 


o 
0 


1 1 
1 J . 


1^.8 


9. 


1 1 

XX . 


3.2 


6.7 


5.0 


rs 
9 


lb . 


12.8 . 


10. 


1. ^ • 


3.1 


4.1 


3.2 


10 


^ /: 
JLO . 


13.9 


11. 


1. J . 


3.2 


4.3 


3.3 


11 


21 . 


12.8 


12. 




4.3 


4.4 


4.5 


12 


22 . 


15.8 , 


14. 


i J . 


J • X 


5.4 


5.2 


13 


33. 


16.8 


15. 






3.8 


3.4 




lo 


17.5 


16. 


1 7 


4.7 ' 


4.9 


4.7 


15 


20, 


17.9 


18. 


1 Q 


5.0 


5.0 


5.0 


16 


30,. 


20.1 


19. 


ly . 


5.1 

J • X 


5.2 


5.2 


17 


36. 


20.9 


20. 


on 


4.3 


4.9 


4.4 


1 o 

lo 


Id. 


22.2 


21. 


^ 1. . 


4.0 


4.2 


4.2 


19 


28. 


22.8 


23. 


z J . 


S 0 

J. w 


5.0 


5.0 


20 


17 . 


2A.2 


24. 


0 A 


4.8 


4.8 


4.8 




2o. 


2A. 7 


25. 


z ^ . 


4.7 


4;7 


4.7 




JO . 


25.6 


27. 


26. 




5.1 


4.9 


23 


/ 

3^ • 


26.1 


2S. 


97 


4.8 


5.2 


4.9 


2q 




27,3 


29. 


9.^ 


4.4 


4.7 


4.5 


25 


3 . 


30.1 


30. 


9Q 


4*5 


4,5 


4.7 


25 


33 . 


;c.6 


22. 


Ji . 


4.1 


4.4 i 


4.2 


27 


35 . 


31.3 


33. 


"^9 


4.2 


A. 5 i 


4.2 


28 


AC . 


32.1 


34. 


J J . 


5.1 


5.5 


5.2 


29 


AC , 


^2.8 


35. 


34. 


4.2 


4.7 


4.4 


30 


3<^. 


33. >^ 


36. 


35. 


4.3 


4.9 1 


4.5 


* 31 


29. 




37. 


37.. 


3.9 


4.1 i 


4.1 


32 


27. 


'36.3 


3B. 


38. 


2.8 


3.3 


3.3 


33 


23. 


35.9 


39.. 


39. 


4.5 


5.5 


5.5 


34 


26. 


39.3 


^0. 


40. 


4.1 


4.2 


4.2 


3S 


17. 


)7.1 


M . 


41. 


2,6 


4.7 


4.7 


36 


h\. 


39.2 


42. 


42.^ 


3. A 


4.4 


4.8 


37 


3. 




A3. 


43. 




* 


A 


33 


6. 




44. 


44. 


1.3 


1.6 


1.6 


39 


2. 




45. 


45. 


* 


* 


* 



^\'cA*\("\ woro T^ot con^; it^ ' bore frequencies were le'^^^than 3. 
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Chapter 5 ^ 
Summary and Conclusions 
We would like to conclude this report by discussing three issues 
that we feel ought to be addressed. The issues are model-data fit, 
errors, and the National Reference Scale for Reading (NRS). Also, we 
discuss the use of item scale values for constructing new tests and 
interpreting such tests with the National Reference Scale. T^ese 
comments will be followed by restating the project objectives, directing 
the reader to relevant sections of the report, and briefly restating, 
conclusions when relevant. 

5.1 Some Comments 

In one sense model -data fit is the central issue in this report. 
If the Rasch model fits the tests used here, then its consequences 
simplify the equating problem. We have presented a number of ways to 
evaluate fit and we have attempted to persuade the reader that fit 
criteria can and ought to be different for different applications. We 
do not believe that a routine application of some statistical test is 

« 

adequate or even correct. The problem refuses to be tied in a nice 
neat package. The tests used here are neither very good nor very bad. 
Their fit is mediocre, and, in fact, rather homogeneously mediocre; 
yet, we believe the degree, of model-data fit is sufficient for test 
equating applications. One thing is clear, we need to learn more about 
model -data fit and the robustness of the Rasch Model for test analysis 
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applications. 

One probably ought not to equate tests, at least in the sense of 
raw score-to-'^aw score equating. It certainly is an unnecessary step 
since it requires the tests to be put on a common scale before equivalent 
raw scores can be determined, and that process itself is sufficient. 
Going one step further, to raw score equating, leads to assignment 
errors (i.e., those errors resulting from calling two raw scores equal 
when their ability difference is not zero). These errors are not 
inconsequential, as Table 5.1.1 illustrates. 

That fable presents a typical score (19) on STEP II Vocabulary 
together with several error estimates associated with that score. These 
errors are shown in both log ability units and NRS units. The standard 
error of measurement, stability index for occurrence, and the four 
sample size stability values pertain to the score itself; while the 
equating constant error and the stability indexes for race and IQ are 
averaged for the test. Finally, the average assignment error was taken 
over the scores of the 14 other tests to which the score of 19 was equated 
(see Table 10, Appendix E). We present these values to give the reader 
a quick summary of the size of errors from various sources as comparecj 
with the standard error of measurement. Naturally, these errors are not " 
independent, but it is interesting to note their relative size and partic- 
ularly the fact that assignment errors are as large or larger than any 
other source. However, for all practical purposes, the various errors 
are quiie small in comparison to the standard error of measurement. It 
is important to note that assignment error occurs in traditional methodology 
such as that used in the ATS. That is, even if one does not choose to, 
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Table 5.1.1 

A Comparison of Various Error Sources for a Raw Score 
of 19 on STEP II Level 3 Form A Vocabulary 



> 


Log Ability 


NRS 


Raw score = 19 


.746 ^ 


207, 


Standard error of measurement 


.45 


4.5 


Average assignment error 


.0414 


U.4 


Equating constant error 


.0085 : 


0.1 


Stability 






N = 500 


.0133 


0.1 


N = 1000 


.0100 


0.1 


N = 2000 


.0072 


0.1 


N = 4000 


.0062 


0.1 


Occurence 


.0178 


0.2 


Race 


.0266 


0.3 


IQ 


.0422 


0.4 



J 
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use trait methods, raw score to raw score equating yields assi«jnriient 
errors^ Two scores on different tests must correspond to slightly 
different ability levels. Assignment error can be avoided only if one 
uses scaling methods that generate reference scales like our NRS. 

5,2 The National Reference Scale for Reading 

Because a common Rasch Model aj)ility scale was necessary for equating 
and since we feel that in principle this scale is a natural and obvious 
one for calibrating and reporting scores, we have produced a transforma- 
tion of the adjusted ability values and called it the National Reference 
Scale for Reading. The NRS is a simple linear transformation of the 
log ability values corresponding to a test's raw scores. The transfor- 
mation is: NRS = 200 + 10(A + C), where-A is the ability estimate for 
a given raw score on a particular test given in the tables in Volume II 
and C is the equating constant for that test. This scale spans all tests 
and all levels, it essentially provides both horizontal and vertical 
equating of the tests, and does not depend on who happens to take the 
test and which test they take. 

We chose this particular transformation fol^ several ""practical 
reasons: (1) log ability is a scale that is not familiar to very many 
potential test users, it is frequently confusing and sometimes difficult 
to communicate test users are more comfortable with an "integer" scale. 
(2) the lowest score in the easiest test is 144 and the highest score on 
" thej^ardest test is 263 (for the tests used here), a difference of 119 
units, when one significant digit of the log ability scale is carried. 
Thus, a three digit scale is required to span the score range. (3) it is 
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our desire to eventually add lower level tests and upper level tests 
(or to see it done) to this NRS for reading; thus sufficient "floor" 
and "ceiling" needed to be provided. Centering the scale where we 
did ought to provide room at both ends. 

5.3 Estimating National Reference Scale Scores from any Collection 
of Items 

In Volume II we proivde, for each test, item and ability parameter 
estimates, NRS scores for all raw scores, and the adjustment constant 
for each test. With the iteni calibrations and the adjustment (equating) 
constant, it is a simple matter to produce a test scoring table of raw 
scores, NRS score equivalents, and standard errors of measurement for 
any test of any length using any collection of the 2,644 items from ^ 
any of the tests. For example, suppose you wanted to put together a 
28 item test composed of one item from each of the 14 primary forms 

of the tests included here, both vocabulary and comprehension. The • 

o 

steps to follow are: 

1. Select the items you wish to administer. 

2. Record their item calibration value (labelled "LOG 
EASINESS") from the tables in Volume II. 

3. Subtract the test's equating constant to its' corresponding 
item Log Easiness. 

4. Enter these adjusted item easiness values into the short 
FORTRAN computer program provided in Appendix G to 
estimate NRS scores. 

5. The output would be a scoring table with raw scores (1-27) 
their equivalent NRS scores, and corresponding standard 
errors of measurement. 
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5,4 Summary 

ObjectiVe 1, To describe the methodolog y for test equating using 
the Rasch Model . Both general Rasch equating luetfjods and the specific 
techniques used in this report appear in Chapter 3 of this Volume. 
Rasch equating consists first of -adjusting all ability estimates in one 
test by a change of scale origin. The amount of change is the equating 
constant. A second step js the matching of ra^ scQres on the tests 
to be equated. It is recommended that this second step, the raw 
score-to-raw score equating be deleted in future equating studies in 
favor Qf raw score-to-common reference scale equating. 

Objective 2. To provide basic item analysis data for each test in 
the data basg > These data appear in Volume IL The first part of 
Volume II gives traditional and Rasch item parameter estimates for all 
tests. The second part of Volume II summarizes- the item information 
for each test and has presentations of the relationships of delected 
items statistics to the item mean square fit index. 

Objective 3> To evaluate the fit of the Rasch Model with respect 
to those tests that were part of the base data . Concepts of fit and 
procedures for evaluating fit were discussed in Chapter 2^ It is 
recommended that test fit be determined primarily by the degree to v^hicl 
specific objectivity is observed in the data. In particular, specific 
objectivity in regard to scoring tables is relevant for assessing appro 
priateness of the model for use in equating. The evidence supports the 
use of Rasch techniques for equating these tests. Moreover, the tests 
appear to have little variance in regard to fit considerations. 
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Objective 4, To investigate the stability of Rasch Model paramete r 
estimates under conditions of varyihg^iample size and sample composition . 
Chapter 2 presents th>e analyses appropriate to this objective. The 
primary conclusion is that data do appear to be highly consistent across 
various sample sizes and compositions. However, anomalies are present 
in some' data in regard to intelligence or racial group differences. 
We believe these preceived differences to be of minimal consequence in 
record to equating, but are of some theoretical interest. In no case 
do differences in scoring tables for different groups approach standard 
errors in magnitude. 

Objective 5> To provide tables of equated scores based on Rasch 
Model methods . Equating tables for raw score-to-raw score equating 
appear in Appendix C and D of. Volume I. Volume II presents raw score- 
to National Reference Scale calibrations. 

Ob.lective 6. To estimate the equating error associated with the 
use of the aboye equating methods . Equating error estimates and estima- 
tion procedures appear in Chapter 3 of this Volume. Appendix E and F 
of this volume present measures of assignment error. Assignment error 
appears to be small, but significant, error source which can be eliminated 
by the use of the National Reference Scale. Other error sources appear 
tt be inconsequential when compared to usual standard errors of 
measurement. 

Objective 7. To compare the results of equating with those obtained 
in the Anrhor Test Study . The discussion of comparisons of the two 
projects appears in Chapter 4 of this Volume. In most instances, 
differences in our equatings and Anchor Test Project equatings are 
inconsequential . 
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TABLE 1 

Stability of Parameter Estimates as a Function 
of Occurrence in the Design 
Tests= CAT s'-A- Occurrences= 10 Parameters" ITEM 
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First Tests 



Second Tests 



Item or 
Score Group 

i 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 

27 - 

28 

29 

30 

31 

32„ 

33^ 

34 

35 

36 

37- 

3a 

39 

40 



Mean 

2.0498> 
2.4548 
1.7240 
1.9373 
1.1049 
0.6460 
1.5186 
1.6226 
1.5709 
0.7353 
1.5305 
0.5222 
0.5903 
-0.0221 
0.5342 . 
0.2409 
0.1926 
0.78U0 
-0.0311 
-0.7584 
-0.0279 
0.3871 
-0.7520 
-0.3077 
0.1385 
-0.6314 
-0.7217 
-0.2365 
-0.8390 
-0.9109 
-1.0227 
-1.1413 
-1.23^2 
-1.0473 
-1.2496 
-1.2092 
-1.4660 
-1.8544 
-1.9813 
-2.8325 



Std. De v. 

0.2327 
0.1697 
0.1648 
0.1464 
0.2815 
0.1600 
0.1571- 
0.1214 
0.1061 
0.1344 
0.1373 
0.2091 
0.0579 
0.0977 
. 0.1490 
0.1634 
0.14*36 
0.1163 
0.1755 
0.0749 
0.0802 
0.0729- 
0.1119 
0.0720 
0.0743 
0.1193 
0.0748 
0.1226 
0.1297 
0.0934 
0.1048 
0.0830 
0.1265 
0.1035 
0.1239 
0.0770 
0.1414 
0.1104 
0.0674 
0.1297 



Mean 

1.9775 
2.3706 
1.6130 
1.6958 
1.1003 
0.6604 
1.4195 
1.5306 
1.3821 
0.6373 
1.4601 
0.3584 
0.5702 
-0.0961 
: 0.4046 
-0.0254 
0.1370 ■ 
0.6^00 
-0.1684 
-0.7578 
-0.0639 
0.3438 
-0.7159 
-0.2661 
0.0307 
-0.5735 
-0.7015 
-0.2725 
-0.8972 
-0.7418 
-0.9614 
-1.0405 
-1.0751 
-0.8406 
rl.ll57 
-0.9508 
-1.2705 
-1.4964 
-1.7474 
-2.5738 



Std. Dev. 

0.1840 
0.2208 
0.1185 
0.1393 
0.3493 
- 0.1109 
0.2290 
0.1001 
0.1403 
0.1394 
0.1184 
0.2089 
0.0974 
0.1232 
0.1563 
0.1293 
0.1447 
0.1214 
0.1558 
0.0770 
0.1110 
0.0699 
0.1602 
0.0968 
0.1183- 
■ 0.1348 
0..1213 
0.0699 
0.1211 
0.1286 
0.0796 
0.0873 
0.1395 
0.1211 
0.1370 
0.0952 
0.1889 
0.0975 
0.1721 
0.1493 
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TABLE 2 

Stability of Parameter Estimates as a Function 
of Occurrence in the Design 
CAT 4-A Occurrences= 7 ParaTneters= ITEM 
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First Tests 



Icetn or 
Score Group 



Mean 



Std. Dev. 



Second Tests 



Mean 



Std. Dev. 



1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
'33 
34 
35 
36 
37 
38 
39 
40 



1.5103 
1.1076 
2.6943 
1.5637 
1.2744 
0.6629 
0.7447 
1.2667 
1.0073 
1.1230 
0.4564"' 
1.3781 
0.9286 
-0.4697 
1.1181 
■1.4509 
0.3716 
0.7283 
0.4987 
0.6241 
0.1446 
-0.4406 
-0.5029 
-0.1074 
0.2233 
-0.0620 
-0.4469 
-0.7099 
-1.0941 
-1.0204 
-1.1363 
-0.8220 
-1.9897 
-1.3496 
-1.6323 
-1.6946 
-1.5391 
-1.9686 
-1.5403 
-2.3500 



0.0941 

0.0990 

0.1812 

0.0973 

0.0670 

0.1129 

0.1258 

0.0720 

0.0867 

0.0770 

0.1828 

0.0888 

0.0624 

0.0999' 

0.0847 

0.0785 

0.0874 

0.0668 

0.0978 

0.1218 

0.0416 

0.1071 

0.0403 

0.0812 

0.0660 

0.1217 

0.1029 

0.0896 

0.1799 

0.1919 

0.0533 

0.0900 

0.0853 

0.0891 

0.1125 

0.1204 

0.1322 

0.1041 

0.0670 

0.2286 



1.4750 

0. 9589 
2.4806 

1. ^091 
1.1789 
0.5369 
0.6680 
1.2139 
0.9904 
0.9093 
0.3459 
1.1644 
0.8456 

-0.3316 
0.9530 
1.2400 
0.3410 
0.6576 
0.3676 
0.5301 
0.0611 
-0.5246 
-0.5793 
0.0270 
0.2461 
0.004 SO 
-0.4879 ■ 
-0.5993 
-1.1059 
-0.9110 
-1.1147 
-0.6901 
-1.8253 
-1.2521 
-1.3854 
-1.3864 
-1.3231 
-1.7111 
-1.3906 
-1.9856 



0.1040 
0.0854 
0.2082 
0.1682 
0.0697 
0.1437 
0.0504 
0.1400 
0.0880 
0.0850 
0.1666 
0.1596 
0.0418 
0.2546 
0.0868 
0.0387 
0.1318 
0.0705 
0.0643 
0.0755 
0.1407 
0.1457 
0.0783 
0.0809 
0.1043 
0.0778 
0.0996 
0.1141 
0.1775 
0.1227 
■0.0547 
0.0816' 
0.1474 
0.1030 
0.1295 
0.1166 
0.1308 
0.1744 
0.1350 
0.1321 
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TABLE 3 

Stability of Parameter Estimates as a Function 
of Occurrence in the Design 
Tests = CTBS 2-Q Occurrences^^ 10 Parameters'^ ITE] 



First Tests 



Second Tests 



Item or 
Score Group 



**Mean 



Std. Dev, 



1 


2.4241 


0.1388 


2 


2.3136 


0.1213 


3 


1.2572 


0.1535 


4 


0.5229 


0.0972 


5 


0.8761 


0.1661 


6 


0.6404 


0.1163 


7 


1.4797 


0.2109 


8 


0.9069 


0.2171 


9 


0.7404 


0.2663 


10 


0.5865 


0.0886 


11 


1.2698 


0.2059 


12 


0.2228 


0.1342 


13 


0,6045 


0.1022 


14 


0.1812 


0.0821 


15 


0.4984 


0.1264 


16 


0.1178 


0.1075 


17 


0.7898 


^0,1488 


18 


0.4229 


0.2134 


19 


0.8176 


0.0802 


20 


-0.3107 ' 


0.1495 


21 - 


-0.3622 


0.1018 


22 


-0.7076 


0.1293 


23 


-1.1153 


0.1640 


24 


-0.9618 


0.1131 


25 


-0.4012 


0..1498 


26 


-0.4732 


0.1957 


27 


-0.1406 


0.0629 


28^ 


0.4969 .' 


' Q.1195 


29 


-0.68/y ~ 


0,0751 


30 


-<1.1759 


0.1326 


31 


-1.4752 


0.1005 


32 • 


■ -0.2592 


0.1370 


33 


-1.1755 


0.0966 


34 . 


0.1962 


0.0564 


35. 


-1.5701 


0.2729 


36 


-0\l844 


0.0941 


37 


-2.3909 


0.1152 


38 


-1.7958 


0.1015 


39 


-1.2081 


0.2189 


40 


-0.9703 


0.1337 



Mean 


Std. Dev. 


2.5987 


0.2087 


2.1740 


0.2091 


1.2316 


0.1542 


0.5432 


0.1380 


0.8435 


0.1876 


0.6557 


0.1107 


1.3911 


0 . 1425 


0.7739 


0 .1545 


0.6906 


r\ 010/ 

0.3134 . 


0.5224 


0.09DO 


1.1828 


0 .1545 


0.3252 


0.1103 


0. 5806 


r\ 1 Q /, 0 


0.1667 


0 . IjOo 


0.4481 


0 . loy 0 


0.1492 


U . UyDJ 

J, 


0.7133 


0 . 14/ y 


0.3205 




0.7135 


0 . 1445 


-0.2900 


c\ c "7 
0 *0o5 / 


-0.3479 


u .uy J J 


-0.7281 


0.0896 


-"-0.9798^ 


0 .1508 


-0.8868 


0.1573 


-0.3666 


0.1626 


-0.3787 


0.1745 


-0.2162 


0.0833 


0.3888 


0.1326 


-0.7143^ 


0.1394 


-1.213,^',-^, 


0.1350 


-1.6119 ^ 


0.1362 


-0.2548 


0.2499 


-1,1643 


0.0712 


- 0.2757 


0.1037 C> 


-1.4737 


0.2560 


-0.1261 


0.1059 


-2.2475 


0.2741 


-1.7237 


0.1404 


-1.1042' 


0.1592 


-0.8606 


0.0676 



Tests= CTBS 3-Q 



TABLE A 

Stability of Parameter Estimates as a Function 
of Occurrence in the Design 

Occurrences= 7 Parameters" ITEM 
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First Tests 



Second Tests 



Item or 
Score Grout 



1 
2 
3 
4 
5 
6 
7 
8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 
22 

23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
AO 



Mean 


o cQ • uev . 


I ICO 11 


z • Ibi / 


n 1 '\r\i 
u . ± jU / 


2^0050 


1 Ci/, C •} 

i • DHJJ 


n 1 99Q 
u . ± ZZ 7 


1.5389 


1 /. 0 Q n 


U . \)y JD 




1 7 1 1 Q 
1 . /iJ J 


u . u ouz 


1 f^'^l 1 


1 * UUoZ 


U » UD / J 


1 DASf^ 


n Q ^7 7 


n 1 n9Q 


0 7ftS1 


U.JOZU 




n R7no 


U . ooo J 


n n7i 7 


^J . / (J4 X 


U . ^ / J / 


V . u ouu 


0.5687 






0.2241 




U . ±UO D 


0.9460 


n /. A c 7 


n 1117 


n 1687 


1 , zybo 


n T n ft A 


1.269-3 


0. 


n nQn9 
* u . u yu z 


\J * ^ ^ X X 


0 . iloo 




0 3517 


-0 . lOoo 


n nQQ 7 


-0. 2466 




n n "^Q 7 


-0.2370 


-0.1855 


U . u j± 0 




-0 . 5788 


n nft 9 ^ 

U . UoZ J 


-0. 5680 


0. 0488 


n 1 9 A A 


0. 0280 


0.0023 


U . U / zu 


-0.0546 


1 . 0767 


n n 71 Q 


0.9363 


-0. 8563 


U ,.Udo J 


-0 8986 


-0, 3430 . 


n 11^7 




-0. 4885 


n 1110 


-n 3976 


-0 . 6660 


U . U jy J 


U . U U J u 


n C / 7 O 

-0 . 5478 


n n7 *7ft 
U . U / JO 


-n sn6i 






-1.1296 


0.0275 


0.0567 


0.0581 


-0.2238 


0.1012 


-0.2600 


-0.7575 


0.1926 


-0; 6583 


-1.3503 


0.0973 


-1.3451 


-0.6725 


0.1290 


-0.6906 


-0.9588 


0.1057 


-0.9140 


-0.7697 - 


0.0784 


-0.6949 


-1.2183 


•0.1349 


-1.0719 


-0.2035 


0.1172 


-0.0766 


-1.5220 


0.0801 


-1.4830 


-1.3422 


0.1289 


-1.2029 


-1.3862 


0.0529 


-1.3619 



Std. Dev . 

0.1781 
0.2211 
0.0868 
0.2202 
0.1034 
0.1122 
0.1671 
0.1961 
0.1310 
0.0878 
0.1387 
0.0972 
0.0835 
0.1085 
0.1251 
0.1106 
0.0378 
0.2137 
0.1062 
0.0918 
0.1287 
0,.1202 
0.2024 
0.1035 
0.0637 
0.1859 
0.0843 
0.4074 
0.1005 
0.0913 
0.1570 
0.04i0 
0.1317 
0.1070 
0.0914 
0.1291 
0.1142 
'0.2070 
0.1039 
0.1388 
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TABLE 5 

Stability of Parameter Estimates as a Function 
of Obcurrence in the Design 



Tests= ITBS 10-5 



Occurrences= 7 



Paraineters= ITEM 



First Tpsts 



Second Tests 



Item or 
Score Group 

1 

2 
3 
4 
5 
6 
7 
8 
9 

11 
12 
13 
14 

15 > 
16 
17 
18 
19 
' 20 
21 
22 
,23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 



Mean 

0.9644 
0.3614 
0.5631 
0.3366 
1.4673 
-0.0159 
0.6244 
1.1366 
0.3806 
0.0889 
0.1189 
1.2013 
0.5003 
0.1190 
0.1967 
-0.0707 
1.0481 
• 0.1746 
0.2993 
-0.6230' 
-0.6319 
-0.1323 
-1.0774 
-0.1549 
0.4214 
-0.0087 
-1.3621 
-0.2820 
-0.9530 
-0.9494 
-1.2101 
-0.4654 ■ 
-0.3703 
-0.3391 
-0.1103 
-0.0290 
-0.7917 
-0.4261 



O UCl • L/C V • 


Mean 


Std. Dev. 




0.8813 


0.0742 




0.4270 


0.0619 


0 1970 


0.6207 


0.1459 


\J • ^ J ^ £. 


0.3126 


0.0984 




1.349.1 


0.1392 




-0.0833 


0.1485 


\J • J. H U t/ 


0.6636 


0.1233 




1.0116 


0.1553 


0.1156 


0.3727 


0.1111 


0.0938 


0.0667 


0.1355 




0 . 199 A 


0.2358 




1 . lOA 7 


0.1046 


0.1120 


O.AIAO 


0.1266 


0.1081 


0.0669 


0.0385 


0 1962 


0.1796 


0.1136 


n 1 


-0.2709 


' 0.0360 


0.0A98 


1.0269 


0.1097 


n 0856 


0.2077 


0.1176 


nil 8*2 


0.228A 


0.1026 


n 1 m 1 

U . J.UX J. 


-0 . 5016 


0.1914 




-0.6900 


0.^0944 


n 1 068 
U.J. uoo 


-0.1733 


0.102 7 


n 1 778 
U . i- / / o 


-1.0673 


0.2265 




-0.1084 


0.1265 


0.0796 


0.4037 


0.1159 


0.1049 


0.0667 


0.1336 


0.1440 


-i;3677 


0.1167 


0.0870 


-0.2323 


0.0703 


0.1385 


-0.9620 


0.1086 


0.1929 


-0.-9701 


0.1429 


• 0.2462 


-1.1777 


0.1413 


0.1501 


-0.4540 , 


0.0878 


0.1462 


-0.2616 


0.1391 


0.1473 


-0.2111 


0.0856 


0.1637 


-0.1020 


0.0634 


0.1929 


-0.0021 


0.1157 


0.1017 


-0.6383 


0.1859 


0.1248 


-0.3291 


0.1291 
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1 

' TABLE 6 

Stability of Parameter Estimates as a Function 

107 

or Occurrence in the Design 
Tests== ITBS 11-5 Occur rences= 7 Parameters= ITEM 



First Tests Second Tests 



Item or 



Score Group 


Mean 


Std. Dev. 


Mean 


Std. Dev. 


1 


0.5319 


0.1414 


0.4641 


0.1796 


2 


1.0351 


0.1398 


0.9440 


0.1739 


3 


0.2713 


0.1038 


0.1477 


0.1203 


A 


1.4191 


0.1905 


1.2840 


0.1658 


5 


0.7973 


0.1065 


0.7630 


0.0705 


6 


0.8113 


0.1592 


0.7226 


0.0966 


7 


0.2530 


0.1468 


0.2579 


0.1377 


8 


•-0.O32O 


0.0634 


-0.0171 


0.1209 


9 


0.5607 


0.114& 


0.54X6 - 


0.0678 


10 


-0.4419 


0.1349 


-0.521Q 


0.1305 


11 


0.5151 


0.0617 


0.3910 


0.1221 


12 


0.9530 


0.1751 


0.9143' 


^ 0.1013 


13 


0.4780 


0.1310 


0.4407 


X).1208 


lA 


-p. 8807 


0.1143 


-0.8839 - 


0.0870 


15 ,. 


0.4661 


0.0916 


■ 0.4113 


0.1052 


> 16 


-0.3209 ^ 


0.1980 


-0.3313 


0.1202- 


17 


-0.5431 


0.1362 


-0.6570 


0.0982 


18 


-0.8744 


■ 0.2679 


-0.8674 


0.1773 


"19 


0.1600 


0.0783 


. 0.1090 


0.1411 


20 


0.1597 


0.0873 


0.2124 


0.1085 


21 


0.2897 


0.0670 


0.2097 


0.1219 


22 


0.4409 


0.0804 


0.3646 


0.0669 


23 


0.6864 


0.0921 


0.5466 


0.1605 


24 


-0.0869 


0.1139 


-0.0191 


0.1339 


25 


0.3574 


0.0990 


0.1730 


0.1081 


26 


-0.0$69 


0.0890 


-0.0091 


Oi0820 


27 


-0.3544 


0.0819 


-0.3034 


0.1414 


28 


0.0904 


0.1416 


0.0487 


0.0918 


29 


-0.2283 


0.0842 


-0.1091 


0.1156. 


30 


-0.0764 


0.2122 


-0.1064 


0.1172 


31 


0.1353 


0.1078 


0.1307 


0.0920 


32 


-0.3660 


0.2745 


-0,2366 


0.0837 


33 


-0.1304. ' 


0.1389 " 


-0.1276 


0.1191 


34 


-0.5649 


0.0662 


-0.5110 


0.0484 


35 


-1.3156 


0.1737 


-1.1819 


.0.1066 


36 


0.1460 


0.0672 


0.1997 


0.0677 


37 


-0.8843 


0.1252 


-0.8327 


0.1223 


38 


-0.6320 


0.1538 


-0.4821 


0.1104 


39 


-0.1641 


0.1267 


-0.0873 


0.0903 


40 


0.5797 


0.1085 


0.6589 


0.0969 


41 


-2.1354 


0.1515 


-1.9664 


0.1216 


42 


-0.4880 


0.1235 


-0.3641 


0.0481 


45 


-0.5196 


0.0993 


-0.3196 


0.1305 
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TABLE 7 



ERIC 





Stability 


of Parameter Estimates as a 


Function 






o*^ Occurrence in 


the Design 




Tests= 


ITBS 12-5 


Occurrences 


= 7 Parameters^ 11 


First Tests 




Second 


Tfests 


cem or 










core Group 


Mean 


Std. Dev. 


Mean 


Std. i)ev 


1 


0.3483 




0.3180 


0.0855 


2 


-0.3824 




-0.4784 


0.1190 


3 


0.5259 




0.5726 


0.1387 


4 


0.6730 




0.6640 


0.1048 


5 


0.9774 


n 11 00 


0.8949 


0.1492 


6 


0..3991 


U . iDoU 


0.9137 


0.1067 


7 


1.2529 


n " 1 0 0 < 


1.13L3 


0.0963 


8 


0.6650 


f\ 10/1 

U . 1341 


0.6440 


0.1208 


9 


0.8814 


n 1 /. o Q 
V . 14Zy 


0.8530 


0.1263 


10 


0.5927 " 


,U. 05ol 


0.5287 


0.0869 


11 


0.1550 


f\ Ciii/ o 

O.,0o42 


0.2017 


0.1380 


• 12 


1.0674 


U. luvZ 


0.9080 


0.1614 


13 


0. 3121 


Pi r\ A 1 o 

0. 0933 


0.2803 


0.1160 


14 


0.6157 


0. 1298 


0.5697 


0.1396 


15 


l.OOSO 


0 .1220 


1 .0316 


0.0367 


16 ' 


0.80^6 


0.1 328 


0.7581 


0:0839 


17 


0.690L 


0. 1364 


0.6206 


0.0909 


18 


0.2493 


0 . 0330 


0.3176 


0.1152 


19 






-0.5011 


0.1706 


20 


1.0719 


0.1117 


0.9861 


0.1106 


21 


-0.0941 


0.1133 


-0.1153 ^ 


0.0541 


22 " 


-0.0183 


0. 2016 


0.0473 


0.1168 


23 


0.8173 


A T /" O T 

0 . lo2 / 


0.8284 


0.1383 


24 


1.5011 


A T O T T 
O.lZl/ 


1.3339 


, 0.1398 


25 


-1.4440 


. 0.1376 


-1.3469 


' 0.1027 


26 


0.3573 


A AO "7 0 

0 . 0873 


0.3003 


0.1140 


27 


0.3304 


A T A 1 A 
O.IOIC) ^ 


0.3864 


0.0899 


28 


-0.3207 


0, 0/DO 


-0.3379 


0.0717 


29 


-0.1340 


A 1 Q 


" -0.0703 


0.1255 


30 


-0.5713 


A 1 A A A 


r0.5169 


0.1439 


31 


-1.4844 


A 1 0 Q A 


-1.3833 


0.1741 


32 


-0.0404 


A CiC "7 Q 


-0.1219 


0.0739 


33 


-1.1316 


A 1 AO 0 

0.1033 


-1.1027^ 


0.1511 


34 


-1 ,3811 


A AO Q /. 


' -1.3057 


0.0723 


35 


0.2459 


A- 1 A^ Q 


0.2217 


0.0730 


36 


-0.5161 


A 1 AAA 

U . LUbU 


-0.3989 


0.1641 


37 


-0.7723 


A 1 1 Q Q 

U.lioo 


-0.6506 


0.1252 


38 


0.4534 


A T A/ C 

0 .104!) 


0.4351 


0.1206 


39 


-0.0463 


A A A ^ O 

0. 0962 


-0.0879 


0.0846 


40 


-0.7716 


0. 1164 


-0.7640 


0.1000 


41 


-0.7023 


A A"7 O 

0*078o 


-0.6739 


0.0957 


42 


-0.6829 


A 091 ft 


-0.6,959 


0.0824 


43 


-0.3851 


0.0883 


-0.4159 


0.1059 


44 


-1.0391 


0.1483 


-0.9013 


0.141^1 


45 


-0.7741 


0.0803 


-0.7471 


0.0950 


46 


-3.3126 


0.2602 


-3.1317 


0.3282 
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TABLE 8 



Tests= MAT E-^F 
First Tests 



Stability of Parameter Estimates as a Function 

of Occurrence in the Design 
Occur rences= 7 Parameters^ ITEM 

Second Tests 



109 



ERiC 



Item or 

Score Group 

1 
2 
3 
A 
5 
6 
'7 
8 

11 

12- 
13 
lA 
15 
1^ 
17 
18 
19 
20 
21 
22 
2 -J 
?A 
25 
26 
2 7 
?8 
29 
30 
31 
32 
■ 33 
34 
35 
. 36 
37 ' 

39 
.♦0 
61 
42 
A3 
AA 
A5 
A6 
A 7 

/.8 
A 9 
50 



Mean 

1.2677 
2.3920 
2.2167 
0.9744 
l.,869A 
1.9656 
1.8903 
1.1366 
1.3049 
0.721A 
.1.3890 
0.9313 
0.7043 
.8863 
0.2^707 
0.5697 
0.9280 
0.5513 
0.05^16 
1.2351- 
0.3646 
0.O54A 
0.3781 
-0.6839 
0.0801 
-0.2837 
0.1056 
-0.0894 
-0.422A 
0.0079 
-0.336A 
-0.0410 
0.0731 
-0.748A 
-0.5623 
-0.fS723 
-0.4650 
-0.2577 
-0.8949 
-0.8680 
-0.9440 
-1.04 2 A 
-1.3386 
-2. 3024 
-1.526A 
-1.A263 
-1.8746 
-2.3379 
-2.5801 
-2.8087 



Std. Dev. 



0.3207 
0.2155 
0.3333 
0.1601 
0.3415 
0.1478 
0.1923 
O.IOAO 
0.1A75 
0.2107 
0.1484 
0.23A4 
0.1349 
0.1543 
0.1613 
0.1499 
0.1980 
0.1608 
0.2299 
0.1847 
0.1678 
0.1396 
-0.1290 
0.1208 
O.0897 
0.1865 
0.1011, 
0.1218'' 
0.1083 
O.15A0 
0.1060 
0.0731 
0.1475 
0.1372 
0.2130 
0.0924- 
0.1469" 
0.0760 
0.1257 
O.07A4 ' 
0.1230V 
0.1028 
0.1498 
0.1941 
0.1306 
0.11A9 
0.1624 
0.1281 
0.2478 
0.3424 



Mean 

1.2336 
2.3931 
2.1144 
0.9407 
1.7806 
1.8623 
1.8001 
1.2974 
1.3789 
0.7033 
1.3799 
0.8729 
0.6599 
0.6793 
0.2777 
0.4867 
0.7841 
, 0.5343 
-0.1591 • 
1.0781 
0.2201 
-0.0191 
6.4076 
-0.5843 
0.0674 
-0.A266 
-{);1174 
-0.1067 
-0.4159 
-0.0211 
-0.2303 
-0.0939 
0.0921 
-0.6471 
-0.3984 
-0.6879 
-0.4641/ 
-0.2396 
-0.8386 
-0.8340 
-0.8226 
-1.043A 
-1.1797 
-2.1786 
-1.4 651 
-1.3141 
-1.6839 
-2.1850 
-2.3687 
-2.7539 



Std. Dev. 

0.A56A 
0.2695 
0.2506 
0.1714 
0.1891 
0.1812 

'0.2856 
0.1666 
0.1776 
0.1588 
0.2173 
■ 0.1762 
0.1617 
0.1911 ■ 
0.2196 
0.1899 
0.0856 
0.1-980 

' 0.1548 
0.1026 
0.1041 
0.1579 
0.1834 
0.1645 
0.1265 
0.0760 
0.1215 
0.1301 
0.0785 
0.0941 
0.1127 
0.1635 
0.1036 
0.1195 
0.2219 
0.1926 
0.1210 
0.0621 
0.2336 
0.0898 
0.1039 
0.1154 
0.1388 
0.2217 
0.1899 
0.1322 
0.1829 
0.2626 
0.1548 
0.4082 
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TABLE 9 



Tests= MAT I-F 
First Tests 



Stability of Parameter Estimates as a Function 
of Occurrence in the Design 
Occurrencer>= 11 Parameters^ ITEM 



Second Tests 



ERIC 



Item or 






Score Grouo 


Mean 


Std. Dev. 


1 


1.7875 


0.1231 


2 


1.8618 


0. 2293 


3 


2.8746 


0 . 2053 


4 


1.7913 


0.1214' 


5 


1.3184 


0.1900 


6 


2.2526 


0.2103 


7 


0.9868 


0.1119 


8 


1.8265 


0. 2051 


9 


-1.4533 


0.1634 


10 


1.0074 


0.0994 


11 


1.1738 


0.1394 


12 


0.8155 


0. 0801 


13 


0.9325 


0.1730 


14 


0.4475 


0.0991 


15 


1.0874 


0.1672 


16 


0.8314 


r\ 1/00 

0.1488 


17 


0.2849 


0.1008 


18 


0.4831 


0.1145 


19 


0.4740 


0.' 056 


20 


-0.5D38 


0.1872 


21 


0.0946 


0.1798 


22 


0.7900 


0.1282 


23 


0.4773 


0.1511 




0.4128 


0. 1450 


.25 


0.1219 


0. 1249 


26 


-0.5192 


0.0940 


27 


0.2428 


. 0.1605 


28 


0.4071 


0.1581 


29 


-0.0671 


0.1128 


30 


-1.5095 


0.2384 


31 


0.4855 


0.1131 


32 


-0.3167 


0.1350 


33 


-0.1337^ 


0.1077 


34 


7O.3328 


' 0.1559 


35 


-0.8668 


0<4ll73 


36 


-0.9211 


0.1283 


37 


-0.4095 


0.0876 


38 


-1.2160 


0.1251 


J9 


-0.6285 


0. 1652 


40 


-1.2858 


n 1 /. A ^ 


<41 


— u . / 0 / J 


yj . J. 7 


42 


-0.9502 


0.1191 


43 


-1.7572 


0.1888 


44 


-0.7424 


0.1217 


45 ^' 


-1.5903 


0.1287 


46 


-1.5681 


Q.1565 


47 


-1.4717 


6.1090 


48 


-2.2412 


0.1453 


49 


-1.5863 


0.1535 


50 


-2.2303 


0.1414 



-1 



Mean 

1.7282 
7672 
6418 
1.7307 
2148 
0461 
0.9337 
1.7251 
3423 
0.8492 
1.0371 
0.7990 
0.7528 
0.4350 
1.0626 
0.6883 
0.3303 
0.4448 
0.4559 
-0.5255 
0.0576 
0.6945 
0.4585 
0.6327 
0.0827 
-0.5216 
0,.2332 
0.3362 
-0.1038 
3536 
0.5235 
•0.2931 
•0.1230 
-0.3949 
-0.8987 
-0.8302 
-0.3632 
1051 
-0.6635 
-1.1510 
-0.7702 
-0.9877 
-1.7195 
-0.6772 
-i:4580 
-1.5159 
-1.3222 
-1.9761 
-1.4638 
-2.0824 



-1 



-1, 



Std. Dev. 

0.1834 
0.2128 
0.1686 
0.1604 
0.1187 
0.1503 
0.1381 
0.1591 
0.2389 
0.1460 
0.1628 
0.0992 
0.1957 
0.0901 
0.1472 
0.1548 
0.1278 
0.0834 
0.1383 
0.1508 
0.2007 
0.1199 
0.1731 
0.1835 
0.1410 
0.1021 
0.1230 
0,1922 
0,1377 
0.2625 
0.0919 
0.1651 
0.0890 
0.1213 
0.1040 
0.1333 
0.0832 
0.1583 
0.1071 
0.1617 
0.-"»14 
0.1121 
^ 0.1925 
0.1070 
0.1757 
0.1767 
0.1450 
0.1025 
0.1216 
0.1987 



TABLE 10 



Stability of Parameter Estimates as a Function 
of Occurrence In the Design 
Tests= STEP 4-A Occurrences= 14 ^ Parameters^ ITEM 



Fir 


St Tests 




Second 


Tests 


Item or 










Score GrouiJ 


Mean 


Std. Dev. 


Mean 


btd. uev 


1 


* 

1.7416 


0.2328 


1. 710> 


n 1AAQ 

u • itfoy 


2 


2.5837 


0.2235 


^ .lyyy 


u • ZUOO 


' 3 


2.1171 


0.1400 


1 . yi4c 


n 1 Ai ^ 

U .10/ J 


4 


1.3245 


0.1227 


i.iy Jo 


U . 1j JC5 


5 


1.6301 


0.1305 


1 At/./ 


n 1 0 *i A 


6 


1.1964 


0.1469 


T T /. O O 

I.IAZ? 


U . Ijyj 


7 


1.8640 


0.1686 


1* 0 7 o 4 


u . i joU 


8 


1.4082 


0.1598 


1. ioo^» 


u . l^U / 


9 


1.4436 


0.1804 


T 0 "7 0 


n 1 QQi 


10 


0.1004 


0.1084 


U . iJ^tf 


u . 1 J 


11 


0.0825 


0. l0^4 


u . U / O'* 


n 1 '^QQ 


12 


0.6194 


0.1885 


U . jU^O 


n 1 1 fil 

u • XX ox 


13 


0.9569 


0.1376 


u . / yoo 


U . XX / z 


14 


0.5309 


0.1780 




U • XQ ^ J 


15 


0.2739 


0.0980 


n 1 AQA 




16 


-0.1106 


0.1236 1 


"~U . UXDD 


n 1 1 AA 

U ft XXHt 


17 


0.5616 


o.oyDD/ 


n m AO 


n 1 1 0ft 

\J . xxuo 


18 


-0.1144 


0.0792/ 


— U . 13Z J 


n 1 1 '^7 

V J . X X J / 


19 


-0.0388 


0.098^ 


— n (T^QA 
— U . U J ?0 


0 Oft?? 


20 


-1.6746 


0.170/ 


—1 . 


n 1 fins 
u . xuuo 


21 


-1.5397 


0.0760 


— 1 . J /Ul 




22 


-U. 9489 


U . UOl J 


-0.8854 


0.1247 


23 


-0.5434 


0.2055 


-0.5039 


0.2170 


'24 


-1.6559 


0.14^57 


-1.3937 


0.1650 


25 


-1.6771 


0.1f26 


-1.5816 


0.0996 


26 


-1.0174 


0.0,817 


-0.8279 


0.0785 


27 


-2.0459 


. 0.?089- 


-1.8907 


0.2473 


28 


-2.0142 


0.^387 


-1.8177' 


0.0802 


29 


-2.6077 ' 


0./1857 


^ 0-2.3:^91 


0.1665 


30 


-2.4453 


0/1499 


-2.1035 


0.1391 



/ 
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TABLE 11 

Stability of Parameter Estimates as a Function 
of Occurtence in the Design 
Tests= SRA BL-E Occurrences= 10 Parameters^ ITEM r 
First Tests Second Tests 



Item or 

Score Group Mean Std, Dev. 



1 


1.7825 


2 


0.9286 


3 


0.0430 


4 


-0.0044 


5 


0.6635 


6 


0.2397 


7 


0.7048 


8 


-0.3140 


9 


0.4093 


10 


-0.1368 


11 


-0.4593 


12 


-0.8615 


13 


1.7003 


14 


1.8279 


15 


0.2265 


16 


0.0794 


17 


0.8716 


18 ■ 


0.8462 


19 


0.6463 


20 


0.8052 


21 


0.8445 


22 


0.5998 


23 


0.0033 


24 


-0.6157 


25 


-0.2315 


26 


-0.0656 


27 


0.4449 


28 


-0.3577 


29 . ■ 


0.6562 


30 


0.4A68 


31 


-0.0970 


32 


-1.0099 


33 


-0.5421 


34 


-0.4805 


35 


-0..7028 


36 


-0.9998 


37 


-1.1236 


38 


-1.7228 


39 


-1.3579 


AO 


-1.2304 


41 


-1.5955 


42 


-0.8610 



Mean Std. Dev. 



0.1944 

0.1534 

0.1658 

0.0972 

0.0961 

0.0494 

0.1274 

0.0990 

0.1053 

0.0616 

0.0707 

0.0905 

0.1625 

0 .°2020 

0.0896 

0.0783 

0.0878 

0.1538 

0.1259 

0.0815 - 

0.i487..: ■ 

0.0986 

0.0906 

0.1133 

0.0845 

0.0875 

0.0901 

0.1677 

0.0843 

0.0860 

0.0537 

0.1206 

0.0833 

0.1398 

0.0899 

0.1151 

0.0998 

0.1715 

0.1743 

0.1198 

0.1572 

0.1122 



1.6379 
0.7214 
-0.2260 
0.0050 
0.4817 
0.1554 
0.5253 
-0.2940 
p. 3248 
-0.1258 
-0.5446 
-0.7487 
1.6090 
1.807 2 
0.2471 
0 ."1184 
0.7876 
0.7840 
0.5523 " 
0.7693 
V 0.8375 
0.6534 
-0.0750 
-0.5149 
-0.1345 
0.0685 
0.4439 
-0.2750 
0.6901 
0.4207 
-0.0347 
-"■-0.8242 
-0.3797 
-0.4698 
-0.6798 
-0.9930 
-1.0555 
-1.6266 
-1.2930 
-1.1674 
-1.4338 
-0.7439 



0.1902 
^ 0,1541 
0.1938 
0.0720 
0.1164 
0.0853 
0.1078 
0.0900 
"0.1207 
0.1045 
0.1456 
0.1451 
0.1581 

- 0.2080 
0.1142 
0.1153 
0.1300 
0.1627 
0.1101 
0.1469 
0.1575 
0.1156 
0.0811 
0.1212 
0.1446 
0.1182 
0.0879 
0.1270 
0.1085 
0.1537 
0.0768 
0.1023 
0.0913 
0.1241 
0.0915 
0.1889 
0.0975 

' 0.2191 
0.1816 
. 0.1102 
0.1990 
0.1763 
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TABLE 12 





Stability 


of Parameter 


Estimates as a 


Function 






of Occurrence in the Design 




tests= 


SRA GR-E- 


Occurrences= 


7 • Parameters= 


= ITEM • 


^lU 






0 






M Ann 


Std. Dev. 


Mean 


Std. Dev. 


1 


1.^584 , 


0.1072 


1 .olol 


0.1715 


2 


0.9703 


0.1269 


0.9201 


0.1386 


3 


1.6793. 


0.1325 


1 . 5026 


0.2056 


4 


1.013C 


0.0801 


n o o ^ 

0.8396 


0.1114 


5 


1.2079 


0 ^ 14 65 


0.8543 


0.1298 


6 


0.7570 


0.1431 


coo/" 

0.5826 


0.0904 


7 


-0.8400 


0.0869 


-0.859A 


0. 0.1405 


8 


-0.7013 


0.1634 


-0.8397 


0,1302 


9 


-0.3633 


0.0558 


-0.A566 


0.1228 


10 


-1.1234 


0. 1800 


-0.9683 


0.158'8 


11 


0.0757 


0.0725 


r\ TOO/ 

0.1384 


' 0.1135 


12 . 


-0.0753 


0.1090 


0. 0441 


0.1159 


13 


0.4679 


0.1407 


0.5013 


0.1047 


14 


1.8524 


0.1033 , 


1 .8820 


0.1977 


15 


1.5134 


0.1178 


1.6219 


0.2015 


16 


0.7674 


0.1747 


0.8293 


0,1514 


17 


0.1356 


0.1105 


0.2270 


0.0627 


18 


,.0.2603 


0. 1525 


0.Z701 


0.0599 


19 


0.5734 


0.1191 


0.^521 


0.0953 


20 


0.2104 


0.1384 


0. 2347 


0.12S6 


21 


-0.0147 


0.1105 


-0.1134 


0.1405 


22 


-0.0397 


0.1590 


0 .03/4 


0.1201 


• 23 


-0.7321 


0.1580 


r\ o o o 

-U. /o39 


0.1525 


24 


-0.4979 


0 . 1140 


r\ A /. /, o 


0.1442 


25 


0.0696 


0.0998 


— 0 .OUUi - 


0.0661 


26 


-0.5816 


O.IO-^S 


r\ A z: o Q 


0.1885 


27 


0.4029 


0.1217 




0.0825 


28 


0.7394 


0.1606 


0.7799 ' 


0.1200 


29 


0.3617 


0.09A8 


0.4830 


0.0913 


30 


-0.6443 


0.1012 


-0. 6770 


• 0,2179 


31 


-0.2959 


0.08A5 


-0. J160 


0.1488 


32 


0.2593 


0.0707 


0.3174 


0.1071 


33 


-0.0254 


0.0910 


-0 . 0434 


= 0.0878 


34 ' 


'■ -0.6014 


0.1403 


•^0.6287 


0.1115 


35 


-0.5300 


0.18AA 


-0.4473 


0.1170 


36 


-0:9596 ■ 


0.1100 


-:1.0350 


0.1258> 


37 


-0.9693 


0.1198 


— 0 . y / '(I 


0.1537 


38 


■ -0.6266 


. 0'.1877 


-0.5109 


0.1173 


39 


. -1.2647 


0.2367 


-1.2779 


0.1873- 


40 


-1.2503 , 


0.1312 


-1.1330 


0.1460 


41 


-1.7497 


0.1371 


-1.7093 


0.2419 


42 


-1.2890 


0,1053 


-1.1766 


0.0521 



123 



TABLE 13 





Stability 


of Parameter Estimates as a 


Function 






of Occurrence in 


the Design 






Tests= SAT I- 


W Oe<:urrences= 


7 Parameters= ITEM 




First Tests 


- 


Second 


/ 

Tests 


Item or 










Score Group Mean 


Std. Dev. 


Mean 


Std. Dev. 


1 


2.0880 


0.1165 ■ 


1.8681 


0.1821 


2 


1.5809 


6.1088 


1.4769 


0.1248 


. 3 


1.5784 


0.0679 


1.3526 


0.1948 


4 


^, 2.1670 


0. 2457 


1. 9509 


0.3243 


5 


1.4883 


0.1451 


1.3403 


0.0576 


6 


1.1160 


0. 2293 


0.9277 


0.1525 


7 


0.7529 


0.1614 


0.6663 


0.1140 


8 • 


0.7496 


0.1013 


0.5686 


0.0630 


9 


1.9506 


0.1764 


1.6806 


0.1224 


10 


0.9910 


0.0991 


0.8511 


0.0811 


11 


0.5067 


0.1562 


0.4509 


0.1395 ' 


'12 


0.7859 


0.11&,2 


0.6164 


0.0754 


13 


0.5359 


0.1343' 


0.3826 


0.1106 


14 


0.6540 


0.0982 


0.3813 


0.D888 


15 


0.6017 


0.0871 


0.4834 


0.0715 


16 


0.7134 


0^272' 


0.5753 


0.0837 


17 


1.4861 


0.1217 


1.3519 


0.1337 


18 


0.0211 


0.0997 


0.0003 


0.1493 , 


19 


0.3069 


0.2001 


0.2569 


0.2275 


20 


0.0657 


0.0893, 


0.1247 


0.0943 


21 


0.058 6 


0.1424 


0.0083 


0.0619 


22 


-0.3270 


0.1043 


-0.2620 


0.1429 


23 


-0.0790 


0.1364 


-0.1889 ■ 


0.1795 


24 


-0.4794 


0.1863 


-0.5789 


0.1098 


25 


-0.5296 


0.1282 


-0.4024 


a:t)97i 


26 


-0.2847 


0.1038 


-0.2373 


0.0853 


27 


-0.6016 


0.0552, 


-0.5956 


0.1153 


28 


-0.8647 


0.0397 


-0.7280 


0.1518 


29 


-1.0494 


• 0.0952 


-1.0717 


0.1644 


30 


-0.8356 


0.1567 


-0.6317 


0.1240 


31 


-1.6864 


0.1437 


-1.3673 


0.1587 


32 


-1.3829 


0.1134 


-1.1346 


0.1525 


33 


-2.2833 


0.0849 


-1.9481 


0.2423 


34 


-2.2913 


0.1892 


-2.0277 


0.1877 


35 


-2.0501 


0.1952 


-1.7249 


0.1237 


36 


-1.8276 


0.1233 


-1.4270 


0.1603 


37 


-1.7496 


0.1514 


-1.4867 


0.1935 


38 


-1.8763 


0.0795 


-1.5010 


0.1497 
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TABLE 14 



ERJC 





Stability 


of Parameter Estimates as a 


Function 






of Occurrence in 


the Design 




Tests= 


SAT II-W 


Occurrences 


= 11 


Parameters^ 11 


First Tests 




Second Tests 


Item or 










Score Grou£ 


Mean 


Std. Dev. 


Mean 


Std. Dev. 


1 


2.0992 


0.2102 


1.8619 


0.1787 


..2 


2.7114 


0.1771 


2.4859 


0.2528 


3 


2.0149 


0.1547 


1.7919 • 


0.1395 


4 . 


2.2099 


0.2049 


2.0188 


0.2728 


5 


1^877 


0.1721 ; 


1.1189 


0.2872 


6 


1.9988 


0.1868 


1.7195 


0.1451 


• - 7 


1.2257 


0.0812 


1.1258 


0.1355 


8 


1.1120 . 


0.2086 


0,8832 


0,1988 


9 


0.6926 


0.1672 


0.6229 


0.1922 


10 


2.1144 


0.1623 


1.8882 


0.1525 


11 


0.9869 


0.1252 


0.8254 


0.1518 


12 


0.7025 


0.0841 . 


0.4896 


0.1152 


13 


0.1422 


0.1029 


0.0745 


0.120J 


14 


0.4325 


0.1645 


0.3348 


0.1587 


15 


0.1945 


0.3135 


0.1305 


0.1643 


16 


0.6322 


0.0798 


0.4645 


n /^*7 o o 

0 ,0/iio 


17 


0.3766 


0.0650 


0.2004 


0 .0956 


18 


0.4205 


0.1574 


0.4045 


0 *121/ 


19 


-0.0295 


0.1069 


-0.1446 


0.0921 


20 


-0.3869 ' 


0.1155 


-0.4192 


0.1805 


21 


1.4365 


0.1376 


1.2375 


0.1375 


22 


0.5235 


0.1880 


0.3786 


0.1363 


23 


0.3809 


0.0596 


0 . 3000 


0.0957, ' 


24 


0.0714 


0.1706 


-0.0033 


^ 0 ,J.852 


25 


-0.1297 


0.1206 


-0.1958 


0.1032 


26 


0.4255 


0.0994 


0.3533 


0.0420 


27 


0.1147 


0.1142 


O.Q660 


0.0608 


28 


-0.2129 


0.0902 


r-o:i375 


^).1734 


29 


0.4885 


0.1330 


0.3836 


0.1164 


30 


-0.7130 


0.1724 


-0.6513 


0.1982 


31 


-0.2105 


' 0.1148 


-0.2569 


0.1065 


32 


-0.0564 


0.1236 


-0.0614 


0.1410 


33 


-1.1771 


0.0998 


-1.0502 


0.1274 


34 


-0.8503 


0.0988 


-0.7768 


0.1352 


35 


-1.3472 


0.1552 


-1.1019 


0.1695 - 


36 


r-0.8431 


0.0482 


-0.7933 


0.0881 


37 


-0.5733 


0.1481 


-0.5079 


0.1638 


38 


-0.2295 


0.2085 


-0.1845 


0.1/61 


39 


-1.2176 


0.1538 


-1.0015 


0 .1028 


40 


-0.8815 


0.1613 


-0.7504 


0.1322 


41 


-1.2554 


0.1531 


-0.9806 


0.1212 


42 


-1.2457 


0.1446 


-0.9895 


0.1661 


43 


-1.8933 


0.1776 . 


-1.5045 


0.1708 


44 


-1.5611 


. 0.1375 


-1.2322 


0.1298 


45 


-1.9595 


0.1544 


-1.6532 


0.1590 


46 


-2.2540 


0.2364 


-1.8905 


0.0923 


47 


-2.4857 


0.1540 


-2.1035 


0.1915 


48 


V -3. 1814 


0.4536 


-2.7701 


0.3433 
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TABLE 15 



Stability of Parameter Estimates. as a Function 
j^of Occurrence in the Design 
xests= CAT 3-A . Occurrences= 10 Pararaeters= ABILITY 
^±v: t Tests ' Second Tests 

Item or 



re Grcup 


Hean 


Std. Dev. 


Mean 


Std. Dev. 


1 


-4.3301 


0.0506 


-4.2470 


0.0390 


2 


T3.57i7 


• 0.0467 


-3.4909 


0.0346 


3 


-3.1007 


'0.0430 


-3.0225 


0.O309 


4 


-2.7474 


0.0395 


-2.6719 


0.0281 


5 


-2.4581 


0.0*560 


-2.3855 


0.0263 


6 


-2.2093 


0.032-7 


-2.1397 


^ 0.0245 


7 


-1.9880 


0.0295 


-1.9221 


0.0230 


8 


-1.7868 


' 0.0264 


-1.7246 


0.0219 


9 


-1.6012 


0.0236 


-1.5424 


0 . 0210 


10 


-1.4273 


0.0211 


-1.3722 


0.0199 


11 ^ 


-1.2625 


0.0182 


-1.2114 


0.0191 


12 


-1.1052 


( . -^159 


-1.0586 


0.0179 


13 


-0.9540 


0, 0135 


-0.9116 


0.0167 


14 


-0.8080 


0. '115 


-0.7698 


0.0156 


15 


-0.6659 


' 0.GS98 


-0.6324 


0.0144 


16 


^0.5267 


0.0083 


-0.4981 


0.0133 


17 


-0.3905 


0.0074 


-0.3662 


0.0120 


18 


-0.2559 


0.0069 


. -0.2366 


0.0106 


19 


.-0.1226 


0.0071 


-0.1085 


0.0095 


20 


' 0.0100 


0.0078 


0.0188 


0.0083 


21 


0.1421 


' 0.0088 


0.1457 


0.O075 


22 


0.2745 


0.0102 


0.2729 


0.0068 


23 


0.4679 


0.0116 


0.4008 


0.0066 


24 


0.5425 


0.0132 


0.5298 


-0.0070 


25 


0.6794 


0.0148 


0.6611 


0.0077 


26 


0.8192 


0.0163 


0.7948 


,0.0089 


2-7 


0.9624 


0.0178 


Q.9320 


0.0104 


28 


1.1103 


0.0196 


1.0740 


0.0122 


29 


1.2639 


0.0211 


1.2215 


0.0141 


30 


1.4251 


0.0226 


1.3762 


0.0163 


31 


1.5954 ■ 


0.0243 


1.5404 


0.0186 


32 


1.7778 


0.0260 


■ 1.7158 


0.0212 


33 


1.9753 


0.0274 


1.9067 


0.0236 


34 


2.1936 


0.0290 


2.1176 


0.0264 


35 


2.4405 


0.0304 


2. 357.1 


0.0294 


36 


2.7289 


0.0322 


2.6373 


- 0.0331 


37 


3.0833 


• 0.0345 


2.9830 


0.0372 


38 


3.5571 


0.0369 


3.4476 


0.0414 


39 


4.3216 


0.P401 


4.2016 


0.O47O 
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TABLE 16 



Tests= CAT 4-A 
First Te-->t.-; 



Stability of Paratneter Estimates as a Function 
of Occurrdnce in the Design 
Occurr*-:nces= 7 . Parameters= ABILITY 



117 



Item or 



1 

2 

3 

4 

5 

6 

7 

8 

9 
10 
ii 
12 
13 
J.4 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 

iH 

29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 



fjo cond Tests 





SCd. DtiV. 


Mean 


0 I.U • JJcV 






— A ,1 7AQ 


0 OAlfi 


-3.5219 


A A "7 'I 
0.0373 


—J . z J/ 




-3.0549 


0.034D 


— ;i . V D J/ 


n' 0*^07 

U . U J v> / 


-2 . 706 


0.0321 • 


— Z . dZUj 






0.0304 ^ 




U . UZI J 


-2 .1823 


0.0291 


0 1 AAA 

-Z .1040 


A AO 9*5 


~l>9b84 


0.027^ 


1 0 A A 1 

-1 .0^41 


n norm 


-3 ."^743 


0. 0267 


T "T A / A 

-1. 7040 


A A1 Q/. 


0.0253 


1 C 0 Q A 

-1 • 3zy4 


n m 

U « yJXOy 


-1.428t 


C.0243 


-1 .3660 




-1.2691 


0.0232 


-1,Z116 . 


A A 1 A 9 


-l.in-: 


0.0217 


-0.0646 


A A1 

0 ,0i jl 


-0.0707 


0.020 5 


-0.9226 


A A1 1 7 

0 .011/ 




0 ,y^l 9 1 


-0.7854 


A A1 A7 

0 .010/ 


' > • 


0.'^176 


-O.63I7 ' 


A A AO Q 

0 .ooyj 


-0 ''^ 


i). '1^.2 


-0. 5204 


A A^^QQ 


— ) . ' ' 


0.0U7 


-0.3911 


n no7A 
0 . 




'\ni29 


0 /* OA 

-0. 2630 


A AAAfl 




O.f 114 


-0.1360 


A A/i^9 
0 .OUjZ 


I'' ' 


o.or^3 


r\ A A A 

-0.0090 




n.l I 


0 J^O^l 


0.1179 


A AAA 9 






i\ 0 / CA 

0* Z45y 


A AAAO 


'> . - - ^' 




0 "7 CA 

0. 3 /:)U 






0 , nPiQ 


A C A C A 

0. 5059 


A AAq7 




0.^103 


0. 0 391 


n nn79 




0-iO^ 25 


A *? "7 C A 

0 . / / 54 


A nnon 

XJ . UU7U 


^ (3 /: ^ ^ 4 


^,'^144 


0. 9151 


A AinA 




0.0171 ; 


t A c: n A 
1. 0599 


A A-| 9A 




' . ' I / / 


1 ? 1 OA 


0.0143 




0.0:23 


1.3679 


0.0163 


i . * "('^ 


^.>'233 


1.5344 


0.0186 




0 .(>::p2 


1.7123 


p. 0211 


1. 


0.^-^14 


1.9051 


0.0233 






2.1171 


;O..0255 


' :L -' >4b 




2.3567 


0.0280 


2.74:i> 


l.O'ilf) 


2.6353 


0,0307 


3 . 0 ^ ^ 


n . n - 3 


2.9794 


6,0335 






3.4393 


0.0358 


4. H74 




^ 4.1861 


0.0388 
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TABLE 17 



Stability of Parameter Estimates as a Function 
of Occut-rence in the Design 
Tests= CTBS 2-Q Occurrences^ 10 Parameters^ ABILITY 



Item or 



Score Group 


Mean 


S'td. Dev. 


Mean 


Std. Dev 


1 


A 1 An/ 

-^.1904 


U.054 7 


-4 .1765 


A An 

0 . 0690 


2 


-3.4323 


0.0507 


O / 1 / c 

-J .4145 


A A C 

O.06O5 


3 


-2.9636 




O A / O O 


A neon 

U . 05 JO 


A * 

4 


-2. 6142 


0.0437 


O C A 0 / 

-2 .5934 


O.0469 


5 ^ 




A A / A O 


0 O AOO 

. j09j 


A n/ 1 ft 

U . 0419 


r 
0 


o f\ci csn 


A A O O ^' 

0. Ojbo 


-2 .0674 


A n O "7 A 


7 


T O T C i 




-1 ,o^^9 


A n o o L 


o 
0 


-1.6828 


A f \ O A O 

0.0303 


-1.6629 


A no An 


9 


-1.5064 


A A "7 "7 

0. 0277 


-1.4874 


A A A ^ n 

O«0269 


10 


-1. 3419 


0.0248 


-1.3241 


A AO A rt 

O«0239 


11 


-1. 1869 


0.0222 


-1.1703 


0.0215 


12, 


-1.0396 


9*0194 


-1.0243 


0*0190 


13 


-0.8984 


0.0170 


-0.8847 


^ 0.0168 


14 


-0.7621 


0.014^ 


-0.7498 


0.0148 


15 


-0.6299 


0«012 3 


-0.6189 


0.0131 


16 


-0.5004 


^ 0.0105 


-0.4911 


0*013.3 


17 


-0.3734 


o.or^6 


-0.3655 


0.0096 


18 


-0.2;80 


0.(V'^'72 


-0.2417^ 


0.0085 


19 


-0.1237 


o.r»Oh3 


-0,1189 


0.0079 


20 


0.0002 




0.0033 


0,0080 ' 


21 


0.1242 


0, 00 71 


0.1257 


0 4.0086 


22 


0.2489 


0.00^6 


0.2486 


0.0092 


23 


0.3746 


O.OiOO 


0.3726 


O.01O5 


24 


0,5021 


0. "0 1 9 


0.4984 


0*0125 


25 


0.^^ j20 


0.01 '^9 


0*6265 


0.0141 


26 


0. 7^S1 


0.0 1S8 


0.7580 


0.0164 


27 


0.9017 


0, {V[ 


0.8929 


A AT O / 

0. 0184 


28' 


1.('435 , 


0,0^ H 


^ 1*0330 


A AO AC 

0 . 0206 


29 


1. 1913 




1 T "7 O A 

1 » 1789 


t\ AO O O 

() . Oz J2 


30 


i • ^7 


0. :^240 


1. 3325 


0 . 0256 


31 


1 . '> ! \ 


O.i'^274 


'"1 / A C O 

1 ,4*i?2 


n no O"? 


32 


1.^:^73 


0.020 7 


1.6697 


0,0316 


33 


1..^7^^^S 


0. )1? f 




0.0351 


^ 34 




0.0351 


2.0698 


0.0385 


35 


. . ruf> 


0.0U7 


2.3080 


0.0425 


3;^ 


:\-:2=> 


0.0AO7 


2.5867 


0.0469 


37 


^> 


0 Ou37 


2.9392 


0.0519 


33 




\ 0^*7 I 


3.3916 


0.0574 


39 


4. 1742 


0.0504 


4.1412 


0.0639 
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TABLE 18 



Stability of Parameter Estimates as a Function 
of Occurrence in the Design 
Tests= CTBS 3-Q Occurrences^ 7 Parameters^ AB1;LITY 

r 

First Tests Second Tests 

Iter?, or 



ocorc oroup 


1 Ctltr 


S td . I)t*V , 


Mean 


Std. Dev. 


± 


■» U V J 




-A. 0641 


0.0540 




^ . J-f o ♦ 


0 0254 - 


-3.3190 


0,0497 




""Z • 000 / 


0 0997 


-2,8613 


0.0456 


A 




0.0202 


-2,5214 


0.0414 




^ • L.\)J\J 


0.0178 


-2,2A59 


0.0379 


0 






-2.0109 


0.0344 


/ 


-1 R99R 


0 01 37 


-1.8036 


0.0307 


o 
0 


J. • U-J JO 


0.0120 


-1.6163 


0.0278 


V 


1 A An 9 


0 01 01 


-1 4A44 


0.0245 


1 A 


-1 9Qft9 


0.0087 


-1.2844 


0.0217 


1 1 


-1 1 ASS 


0.0072 


-1.1334 


0.0189 


1 0 

IZ 


L ♦ U U U J 


0.0059 


-0.9901 


0,0162 


1 1 
1 J 


U • O UJLO 


0.00 A 3 


-0,8530 


0,0139 




— u . / *- / o 




^ -0.7206 


0,0114 


1 J 


-0 S07P 


0.0022 


-0,5920 


0.0091 


1 A 


-H 'i7A^ 


0 .OOl*^ 


-0.4667 


0,0070 


1 ' 




O.OOIA 


-0.3A40 


0.0054 


lo 


-"U ♦ ^ 4 *. 


0 0A7P 

V ( . V. / ^ 4^ 


-0.2231 


0.0043 




—0 X '^^ ^ 




-0.1033 


0,0044 


OA 






0.0159 


0.0055 




A T 0 7 7 


A nAA*^ 

\j • UU'4 J> 


0,1343 


0.0070 




A O ^ Q ■) 


A PAS9 " 


0.2536 


0.0092 




A 7 70 7 


' A no AO 


0.3736 


o.diio 


2A 


U . DvJ/O 


A '^AA7 


0.4946 


0.0126 


^ c 


U . O - J 


A OA 7 A 


0.6179 


0.0146 


OA, 


A 7 'i A 


0 00^9 


0.7439 


0,0165 


07 
w / 


A 


A' OORA 


' 0.8733 


0,0183 


0 J 


1 . Ti V D 


A (1A0S 


1.0071 


0.0198 




1 .1'.O'^ 


0.0100 


1.1464 


0.0221 


JO 


1.3078 


0.0107 


1,2926 


0.0237 


31 




0\0113 


1.4480 


0.0254 


• 32 


1,6)18 


0.0120 


1.6146 


0.0274 


33 


l.-;!'-2 


o.oi:/ 


1.7956 


0.0289 


- 34 
35 


2 .01 A'' 


O,'^! 32 


1.9964 


0.0305 




0.0137 


2.2243 


0.0321 


36 


2.5142 


^ 0.01:^0 


2.4923 


0,0337 


37 


2.8:^67 


0.0152 


2.8239 


0.0355 


33 




0.01^1 


3.2723 


0.0372 


j9 




0.0:^^9 


4.0077 


0.0389 
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I .-/^ TABLF 19 

Stability of Parameter Estimates as a Function 



120 



of Occurrence in the Design 
Tests= fXBS 10-5 Occurrencos= 7 Parameters' 



ABILITY 



First Tests 



Iter" nr 
Score Group 

1 
2 
3 
A 
5 
6 
7 
8 
9 

10 
- 11 
12 
13 
14 
15 
16 
17 
18 
IQ 
20 
21 
22 
23 
24 
.25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 



Mean 

-3.8286 
-3.0943 
-2.6471 
-2.3173 
-2.0517 
-1.8259 
-1.6273 
-1.4487 
-1.2851 
-1.1324 
;-0.9889 
'-,Q.8526 ,. 

-0.72-1^^ 

-0.5951 

-0.4723 

-0. 3519 

-r'.2333 

•-0.11^1 
0. QOOo 
0.1171 
0.234f- 
U. 352^ 
0.4''3 ^ 
0.59''^3 
0. 72.'' 7 
0.;-533 



1 3 50 
44vO 



31*^'* 
.''.•■4 56 
3." v.. I 
3.>^2 i4 



Second Tests 



3 la. Pev/ 


Mean 


oLu, uev. 


0.0329 


-J . oUbU 


U . U J J± 


0.0304 


-3 ,0739 


A AO no 


0.0285 


-2 . 6291 


A AOQ/i 


0.02-62 


-2.3010 


A AO A1 


0.02^'t2 


~2 , 0367 


A AO 0 Q 


, 0.0224 


-1. 8126 


A A 0 0 1 

U, UZZi 


0.0205 


-1. 6157 


A no^n 
u , uzuu 


0.0186 


-1.4384 


A AT Q A 

0, OloU 


0.0172 


-1, 2761 


*v A AT O 
^ U.^OlOJ 


0.0153 


-1,1251 




0.0137 


-0.9829 


A m 0 c 


0.0120 


-0, 8474 


A AT r\ £i 


0.0104 


-0,7179 


A nno 0 

u. uuyz 


^ 0.0087 


-0, 5926 
-0,4703 


A A A7 C 


0.0073 


A A AWQ 


0.0060 


-0, JDll 


A AHA 


0.0046 


-0 . 2334 


u . uuzo 


0 , 0032 


rv 1 1 "7 1 

-0 , 11 ^ I 


U , UUZi 


0 , CO? A 


-0 , UU LA 


0 009 


0.0025 


0 * llAU 


0 00*^ S 


0,0040 


A 0 0 A 7 

0 , 230/ 


0 OOAJ^ 


0. oos?. 


0,34ol 


A A r\(i o 




0,4677 


U, UU/ H 




r\ c o A ri 

0 , 5899 


A AAOn 


0,0'"^^ 


0,7137 


A m AT 

U, UJ.U / 


0.0115 




O 01 91 


0.01 33 


U , 70 i J 


0 0136 


0.0152 


1.1240 ■ 


0.0150 


0.0170 


1.2754 


0.0167 


0.0187 ' 


0.4386 


0.0182 


'>.0210 


1.6163 


0.0197 


0.0?20 


1.8137 


0.0211 


<^OJ49 


■?-.0'^a4 


0.0228 


0.0271 


2.3034 


0.0243 


0.0297 , 


" 2,6321 


0.0260 




3.0781 


0.0274 


0.034 5 


3.8111 


0.0289 
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TABLE 20 

Stability of Parametdr Estimates as a Function 
of Ocrurrence in the Design 
Tes£s= ITBS 11--5 Occurrences= 7 ^ ParaTneters= ABILITY 
First Tests *^ Second Tests 

Item OT 

Score Group Moan Std. Dev. Mean Std. Dev. 



1 


-3.9311 


0.0277 


-3.9010 


0.0170 


2 


-3.2044 


0.0260 


-3.1760 


0.0158 


3 


-2.7651 


0.0244 


-2.7381 


0.0151 


4 


-2.4430 


0.0227 


-2.41/4 


0.0140 


5 


-2.1847- 


0.0210 


-2.1604 


0.0132 


6 


. -1.9667 


0.0198 


-1.9441 


0.0122 


7 . 


-1.7764 


0.0181 


-1.7550 


0.0115 


8 


-1.6Q59 


0.0167 


-1.5859 


0.0107 


9 


-1.4503 


0.0153 


-1.4317 


0.0098 


10 


-1.3066 


0.0143 


-1.2891 


0.0092 


11 


-1.1714 


0.0129 


-1.1557 


0'.0085 


12 


-1.0441 


0.0U6 


-1.0297 


0.0078 


13 


-0.922^ 


0.0107 


-0.9094 


0.0070 


14 




0.0094 


-0.7940 


0.0064 


15 


-0. 6^>3i> 


0.0084 


-0.6824 


0.0058 


16 


-Q,. SH'sf^ 


-P.0076 


-0,5741 


0.0053 


17 


-0.4 761 


0.0065 


-0.4684 


0.0046 


18 


-0. 3711 


0.0058 


-0.3644" 


0.0039 


19 


-0. 267 3 


0.0046 


-0.2621 


0.0036 


20 


-0. 16-. J 


0.0041 


-0.1607 


0.0030 


21 


-0. 062 ? 


0.0035 


-0.0597 


0.0026 


22 


0. 0401) 


o:on36 


0,0411 


0.0024 


23 


0. 1^24 


0.0037 


0.1423 


0.0021 


24 


0. 2.iS9 


0.0039 


0.2443 


0.0024 


25 


n.3S0O 


0.0047 


0.3473 


0.0027 


26 


0.4563 


0.0055 


0.4517 


0.0030 


27 


0. 3644 


0.0063 


0.5586 


0.0039 


28 


0. 6?"54 


.. 0.0074 


0.6680 


0.0044 


29 


0. 7H-''-i 


, O.onRS 


0.7809 


0.0049 


30 


0.903 3 


0.0099 


0.8979 


0.0056 


31 


1.0320 


0.0109 


1.0199 


0.0066 


32 


I. 1620 


0.0127 


1.1480 


0.0074 


33 


1. 2997 


0.0138 • 


1.2841 


0.0085 


34 


1.4 '.69 


0.0154 


1,4296 


0.0093 


35 


1.6064 


0.0169 


1.5869 . 


0.0107 


36 


1.7811 


0. 0134 


1.7596 


0.0120 


37 


. 1.9767 


0.0204 


1.9527 


0.0134 


38 


2.20(>6 


0.0225 


2.1740 


0.0145 


39 


2. 4654 


0.0250 


2.4360 


.0.0163 


40 


2. 79'^'. 


0.0275 


2.7630 


0.0181 


41 , 


3.2443 


0.0303 


3.2081 


0.0202 


42 




n.0336 


3.9414 


0.0226 
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TABLE . 21 





Stability "of Parameter EstimcV^es as a 


Function 






of Occurrence in 


the Design 






Tests= 1T3S 


12-5 Occurrences= 


7 Parameters= ABIL 




First Tests 


\' 

\ 


Second 


Tests 


Item or 










Score Group Mean 


'$td.: Dev. 


Mean 


Std. Dev. 


1 


_/i 1171 


' \ 

0\iD280 


-4.0844 


0.0261 






, 0.0267 


-3,3591 


' 0.0254 


o 

J 


. you J 


0.0256 


-2.9204 


' 0.0245 






0.0240 


-2.5991 


0.0238 


c 

5 


-Z . jbo / 


0.0229 


-2.3417' 


0.0229. 


6 


0 1 / Q 7 

-Z . 149 / 


0.0218 


-2.1244 


0.0222^ 


7 


-1. y^oy 


0.0206 


-1.9346 


^ 0.0212 


8 


"y "7 0 "7 0 

-1. 7873 


0.0192 


-1.7646 


0.0207 


9 


-1.6310 


0.0180 " 


-1.6094 


0.0195 


10 


-1. 4860 


0.0169 


-1.4660 


0.0189 


11 


-1. 3505 


0.03,59 


-1.3317 


0.0181 


12 


-1 . 2223 


0.0147 ^ 


-1,2047 


0.0172 


13 


-1 . 1001 


0.0137 


-1.0834 


0.0161 


14 


-0. 9827 


0.0123 

I* 


-0.9677 


0.0154 


15 


-0, 8691 


0.0113 


-0.8556 


0,0144 


16 


-0. 7593 


0.0103 


-0.7467 


0.0134 


17 


-0.6517 


0.0092 


-0.6403 


0.0126 


18 


-0 . 5461 


0.0083 


-0.5363 ' 


0,0117 


19 


-0.4427 


O.P072 


-0.4339 


0.0108 


20 


-0 . 3401 


0.0063 


-0.3326 


0.0096 


21' 


-0. 2381 


0.0055 


-0.2323 


0,0086 


22 


-0. 1371 


0,0048 ■" 


-0.1323 


= 0.0077 


23 


-0 . QJol 


0.0039 


-0.0324 


0,0066 


24 


0 . 0631 


0.0032 


0.0674 


0.0059 


25 


0 . 16/3 


0.0030 ,. 


0.1683 


0,0051 


26 


0 . 2700 


0.0032 


0.2696 . 


0,0046., 


, 27 


0.374 7 


0.0036 


0.3727 


0,0041 " 


2'8 


A A 0 A 


0.0043 


0.4771 


0.0042 


29 


A C 0 0 A 


0.0050 


0.5837 


0.0046 


30 


A CCiCW 

0 . byy^ 


0.0062 


0.6931 


0,0056' 


31 


A 0 1 0 7 


0.0068 


■ 0.8059 


0,0066 


32 


/"\ A T 1 7 


0.0080 


0.9224 


0,0079 




^ C\^L 7 
J, . 1 


0.0088 


1.0436 


0.0092 


34 


1.183A 


0.0102 


1.1703 


0.0109 


35 


1.3190 


0.0112 


1.3039 


0,0126 


36. 


1.A631 


0.0128 . - 


1,4461 


0.0142 


37 


1.6176 


0.0140 


1.5989 


0,0163 


38 


1.7857 


0.0158 


1.7,643 


0.0186 


39 


1.9707 


0.0179 


1.9467 


0.0213 


40 


2.1786 


0.0207 


2.1514 


0.0249 


41 


2.4171 


0.0236 


2.3869 


0,0291 


42 


2. -7011 


0. 0283 


2 . 6664 


0.0348 


43 


3.0563 


0.0355 


3.0164 


0.0430 


44 


3.5396 


0.0461 • 


3.4919 


■ 0,0555 


45 


4,3259 


0.0629 


4.2673 


0.0755 
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TABLE 22 

Stability of .Parameter Estimates as a Function 
of Occurrence in the Design 
Tests=*' MAT K~F Occurrences= 7 Parameters^ ABILITY 
First Tests Second Tests 



123 



Item or 



ore Group 


Mean 


Std. Dev. 


-Mean 


Std. Dev. 


1 


-4.5423 


0.0791 


-4.4961, 


0.1013 


2 


-3.7983 


0.0737 


-3.7534 


A AA C / 

0.0954 


3 


-3 .342CI 


0.0589 


O A A O O 

-3,2983 


0. 0895 




-3.0036 


0.0641 


-2.9611 


A A O O O 

0. 0838 


5 


-2.7297 


A A e A A 

0.0599 


-2.6887 


A A T O C 

0, 0785 


6 


-2.4970 


0.0557 


-2.4570 


0.0732 


7 


-2.2920 


0.0524 


-2.2534 


A /^/T O A 

0. 0683 


8 


-2.108O 


' 0.0490 


-2,0706 


0.0633 


9 


-1 .9391 


0.0460 


-1^9030 


A A C O A . 

0,0589 ' 


10 


-1.7826 


0.0430"' 


. -1.7479 


0*0543 


11 


-1.6359 


0.0404 


-1,6026 


0*0505 


'^12 


-1.4971 . 


0.0379 


-1,4651 


0*0466 


13 


-1.364? 


0.0357 


-1,3343 


0,0423 


14 


-1.2377 


0.0334 


-1.2090 


0.0387 


15 


-1.1157 


0.0313 


-1.0884 


0.0352 


16 


-0.9973 


0.0293 


-0*9714 


0.0320 


17 


-0.8823 


0.0275 


-0.8579 


0. 0284 


18 


-6'. 7694 


0.0258 


-0,7470 


0. 0253 


19^ 


-0.65^0 


0.0237 > 


**-0,6384 


0*0222 


20 


-0.5504 


0.0223 


-Q.5317 


V * 0. 0195 


21 


-0^4433 ' 


0.0208 


-0.4267 


0.0171 


22 


-0.3374 


0.0195 


-0.3223 


0.0146 


23 


-0.2520 


0.0180 


-0.2191 


0. 0128' 


24 


-0.U71 


0.0168 


-0.1166 


0. 0113 


25 


-0.0224 


0.0161 


-0. 0139 


0. 0106 


2f) 


0.0824 


0.0153 


0, 0886 


0. 0109 


27 


0.1877 


0.0150 


^ 0.1914 


0. 0119 


28 


0. 2937 


0.0148 


0,2951 


A A 1 A /* 

0. 0136 


29 


0.4010 


0.0153 


O A A A 

0.3999 


A A 1 C O 

0,0158 


30 


0. 5094 


0,01 50 


0. 5056 


0.0181 - 


31 


0.6196 


0.0172 


0. 6131 


0. 0206 


32 


0.7321 


0.0185 


A "7 A O A 

0. 7229 


A A A A C 

0. 0235 


33 


0.^467 


0.0202 


0y8349 


A A A ^ O 

0. 0263 


3-4 


0.9650 


0.0225 


0.9500 


A AAA/ 

0.0294 


35 


1.^^860 


0.0248 


1.0684 


0.0327 


36 


1.2130 


0.0271 


1.'1914 


0.0362 


37 


1, 3143 


0.0304 


1.3193 


0.0398 


38 


1.4316 


0.0335 ' 


1.4530 


0,0436 


39 


t.62hl 


0,0'^72 


1.5939 


0. 0477 




1 7701 


' n HA no 




n 0S9? 


Al 


1.9431 


0.0459 


1.9031 


0.0569 


A 2 


2.1201 


0.0506 


2.0757 


0.0619 


43 


2.3136 


0.0560 • 


2.2653 


0.0674 


44 


2.5289 


0.0623 


2.4757 


0.0735 


45 




0.0687 


2.7156 . 


a 0.0799 


46 


3.0597 


0.0763 


2.9974 


0.0874 


47 


3.4123 


0.0846 


3.3453 


0.0955 


48 


■3.8834 


0.0933 


3.8123 


0.1041 


49 


4..W37 


. 0.1030 


4.5687 


0.1141 
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TABLE 23 

Stability of Parameter Estiuiates as a Function 
of Occurrence in the Design 
Tests= MAT I-F ^ Occurrences= 11 Parameters^ ABLLITY 



First Tests 



Item or 



ERIC 



Second '^ests 



Scorf Group 


M^an 




Mean 




1 


-4.5762 


O.Ob77 


-4.4899 


0.0534 


2 


-3.8226 


0.0024 


-3.7433 


^ 0.0490 


3 


-3.3585 


0.0576 


-3.2855 


, 0.04!i2 


"4 


-3.0137 


0.0-^36 


-2.9459 


' 0.0419 


5 


-2.7345 


0.0 ■96 


-2.6717 


f 0.0388 


'6 


-2.4971 


0.0' 59 


-2.4386 


0.0359 


7 


y2.2885 


0.04 >7 


-2.2343 


0.0335 


8 


-2.1008 


0.0396 


-2.0503 


0.0311 


< 


-1.9:'92 


0.0365 


-1.8823 


0.0289 


10 • 


-1.7; 10 


0.0338 


-1,7266 


0.0268 


13 


-1.6111 


0.0311 


' -3.5811 


0.0251 




-1 .4f05 


0.0286 


-D.4435 


0.0233 


13 


-:..3^62 


0.0261 


-•] .3124 


0.0218 


14 


-i.2176 


0.0239 


"1.1869 


0.0201 


15 . 


-1.0" 40 


0.0217 


••1.0660 


0.0187 




-0,9 


0 0-94 


-0.9493 


0.0171 


, ' 17- 


-0.8 •"'7 5 


3.0172 


-0.8353 


0.0157 


la 


-0.7435 


D..Oi.5'4 


-0.7243 


0.0142 


19 


-3.6'22 ^ 


■ O.O 3 3 


-0.6157 


0,0130 


20 


-1.5- 24 


O.dlLA 


-0.5088 


0.0118 


21 


-).4]A3 


0.0(197 


-0.4033 


0.0107 


' 22 » 


-0.3071 


0.0083 


-0.2993 


0.0096 


23 


-0.2'ul - 


' 0.0.7', 


-0.1961 


0.0088 


24 




0.';":.')O 


-0.C932 


0.0083 


25 


O.OO?? 


O.O" ") 


0.0094 


0.0080. 


26 


0.]- 50 


0.' ^6 


0.1115 ■ 


0.0078 


27 


o.r2n5 


0. '( ''-1 


0.2145 


0.0083 


28 ' 


0. ^ ''>':> 


0.'-' ^■ 


0.3176 


0.0090 


29 




0. > 


0.^?15 


0.0098 


30 


'). 


0 . 1 >: s - 


0.5264 


0.0112 


31 


O.h ')3 


0.1'"^ 


0.6327 


0.0125 


32 


0. 7 » 5 


O.i - V 


0.7409 


0.0141 


33 


O." " 7 


0.. 


0 . 8511 - 


0.0156 


34 




OM^.l 


0.964] 


0.0172 


35 


l.i';'i5 


0.0 ,! 


1.0798 


0..0189 


36 


1 . M21 


0 ' • 


1.1994 


0.0209 


37 


1. r90 


0 i ; i ., 


1.3230 


0.0225 


38 


1. rin 


0.0 I' 


L.4521 


0.0245 


39 




0.',' ' ! 5 


1.5875 


0.0265 


^.0 


1 7737 


: . 0 i I 


1.7304 ^ 


0.0283 


41 


1.9., lb 


f .(i . M 


1.8829 


0.0303 


. . 42 


2.0991 


o.n ■ ; 


2.0470 


0.0320 


43 


2. 2? 18 


(.0. -. 


2.2266 , 


0.0342 


44 


2. :..49 


f . 


2.4261 


9.0363 


45 


2.7156 


( .• •> 


2.6535 


0.0383 


46 


2.iB'^8 


( .■ 


2 .9212 


0.0400 


47 


3. 3 ".20 


('. ^' 


3.2529 


0.0424 


48 


3.7/4S 


( . ! • ' 


3.7021 


0.0446 


49 


4.3149 


{ > l(« 


4.4384 


0.0465 
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TABLE 24 

Stability of Parameter Estimates as a Furiction 
of Occurrence in the Design ^ 
/ Testsy STEP 4-A Occurrences= 14 Pai:aineters= ABILITY 
First Tests • - Second Tests 



125 



. Item or j 










Score Group 


Mean 


Std. Dev. 


Mean 


Sta« Dev« 


1 / 


-4.1856 


0.0651 - 


-4.0365 


0.0555 


2 • 


■ -3.4041 


0.0600 


-3.2661 


0.0513 


3 / 


-2.9094 


0.0555 


-2.7815 • 


0.0474 




-2.5312 


0.0519 


-2.4129 


0.0439 


5, 


-2\2159 


0.0481 ^ 


■-2. 1071 


0.0402 


6 


-1.9393 


0.0447 


, -1.8400 


0.0370 


7 


-1.6886 


0.0417 


-1.5986 


' 0.0334 




-1.4559 


0.0384 


-1.3756 


0,0299 


/ 9 


-1.2361 


0.0352 


• -1,1652 


0.0264 


. - 10 


-1.0256 


0.0321 


-0.9644 


0.0231 


11 ■ 


-0,8218 


0.0287 


-0.7702 


0.0198 


/ 12 


-0.6222 


0.0255 


-0.5810 


0.0169 


13 


-0.4252 


0.0222' 


-0.3946 . ' 


0.0142 


14 


-0.2296 


0.0189 


" -0..2O97 


0.0121 . 


■ / 15 


-0.6341 


0.0161 


-0.0255 


0.0112 


16 


0.1629 


0.0140 


0.1601 


0.0113 


17 


0.3622 


0.0138 


0.3474 


0.0130 


18- 


0.5654 


0.0156 


0.5381 


0.0151 


19 


0.7735 


0.0190 


0.7336 


0.0179 


20 ' 


0.9880" 


0.0238 


• 0.9349 


0.0212 


21 


1.2106 


0.0294 


1.1444 


0.0248 


22 


1.4438 


0.0355 


1.3644 


0.0286 


23 


1.6-906 


0.0418 


1.5982 


0.0327 


24 


1.9557 


0.0485 


1.8504 


0.0371 


25 


2.2466 


0.0552 


. 2.1289 


0.0417 


26 


2.5759 


0.0618 


2.4460 


0.,0467 


« 27 
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Figure 3.4.2 
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Figure 3.4.16 
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Figure 3. A. 17 
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Figure 3.4.18 
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Figure 3.4.19 
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Figure 3. A. 21 
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Figure 3.4.23 
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Figure 3.A»24 
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Figure 3.4,25 
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Figure 3. A, 26 
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Figure 3.4.27 
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Figure 3.4.28 
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Figure 3.4.29 
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Figure 3. A. 30 
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Figure 3.4.31 
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Figure 3. A. 32 
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Appendix G 

FORTRAN Program for Producing NRS Scores 
for any Collection of Items 
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C FORTRAN PROGRAM FOR ESTIMATING 

C NATIONAL REFERENCE SCALE SCORER 

C FROM ANY SET OF ITEMS FROM VOL. II 

C A'DA'DTED FROM WRIGHT AND PANCHAPAKASAN. . .(1969) 

C 

•DIMENSION D(80),SI(80),B(100),SA(100),Y(100) 
C INPUT K=NUMBER OF ITEMS 
C INPUT D=ADJUSTED ITEM EASINESS AND 
~ C ~ - S-I=STANDARD-ERR0R OF-.ESIIMATE of 

; ITEM .EASINESS 

SC=. 00001 
\ NGK=K-1 

DO 5 J=1,NGK 
: B(J)=0.0 
<3. DO. 7 IT=1,50 , 

' -^-^ G=-J 

GP=0.0 ' 

DO 6 1=1, K ■ 

P=EXP(D(I)+B(J)) 

PP=1.0+P 

6=G+P/PP 

6 GP=GP+P/PP**2 
G-G/GP 
B(J)=B(J)+G 
SD=G/B(J)+G 
ANS={SD**2)-SC 
IF(ANS.LE.O.O) GO TO 8 

7 CONTINUE 

8 CONTINUE 

DO 11 J-1,NGK 

V=0.0 

C-0.0 

DO 9 1=1 K 

^ Y(I)=EXP(D(I))/(1.0+EXP(D(I)+B(J)))**^ 
■ 9 C=C+Y(I) 

DO 10 1=1, K 

10 V=V+Y(I)*Y(I)-S(I)-*SI(I) 

11 SA(J)=SQRT(1 .0/(C*EXP(B(J) )+V/C**2)) 
DO -13 1=1, NGK 

• ICS=IFix(B(I)*1.0.+200.5) 
ECS=SA( I)*10. 
13 WRITE(6,14) i,B(i:),SA(I),ICS,ECS 

C OUTPUT INCLUDES:' 
C 1. I=SCORE GROUPS NUMBER 
C 2. B(I)=ABILITY ESTIMATE 
C 3. SA(I)=STANDARD ERROR OF ABILITY ESTIMATE 
C 4. ICS=NATIONAL REFERENCE SCALE SCORE 
X 5. ECS=STAflDARD ERROR OF MEASUREMENT (NRS SCORE) 
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